Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocrd workspace: file paths should be made relative to self.directory #1213

Open
bertsky opened this issue Apr 17, 2024 · 1 comment
Open

Comments

@bertsky
Copy link
Collaborator

bertsky commented Apr 17, 2024

On the CLI, the user expects that paths are resolved once (on the input side) and from then onwards everything is automatically done relative to the workspace (whenever possible). So if a file path is absolute, then its workspace prefix gets removed regarding METS FLocat. Or if it is relative, then it is relative to the CWD, but gets resolved relative to the workspace.

Unfortunately, that's not what we have implemented yet, esp. if you combine with an overall -d path/to/workspace (so CWD is not equal workspace directory).

For example, in add_file, we currently have:

if not isabs(fname) and exists(join(ctx.directory, fname)):
fname = join(ctx.directory, fname)

Meaning, the workspace directory gets added, with no regard to the caller's CWD.

Another example is in bulk_add, where we do:

if src_path_option:
src_path = src_path_option
for group_name in group_dict:
src_path = src_path.replace('{{ %s }}' % group_name, group_dict[group_name])
srcpath = Path(src_path)
else:
srcpath = file_path

Here, again, no difference is made between CWD and self.directory.

In the latter case (bulk-add), what makes things worse is that the glob pattern gets treated verbatim if empty:

expanded = glob(fglob)
if not expanded:
file_paths += [Path(fglob)]

So if one tries to compensate for the above path resolving issue by passing a glob relative to the workspace, then that itself will break the METS with a verbatim asterisk path entry:

ocrd workspace -d nd1967-12-14_var1/ bulk-add -r "TEXTRACT_PAGE/(?P<page>.*)[.]xml" -G TEXTRACT_PAGE -g '{{ page }}' "TEXTRACT_PAGE/*.xml"
INFO ocrd.cli.workspace.bulk-add - [   1/1] TEXTRACT_PAGE/*.xml
@bertsky
Copy link
Collaborator Author

bertsky commented Sep 2, 2024

Same is true for the --mets-server-url argument: relative paths don't work (not relative to the workspace and not relative to CWD).

(Again, the user expectation is that whatever resolves at startup time from the user's CWD should be kept.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant