Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Enhance ocrd workspace bulk-add to add URL entries instead of files #1086

Closed
stweil opened this issue Sep 4, 2023 · 3 comments
Assignees

Comments

@stweil
Copy link
Contributor

stweil commented Sep 4, 2023

ocrd workspace bulk-add can be used to add ALTO XML files to an existing METS file (see example). The new entries refer to the local files which were added, but for the production use on a web server the METS file must contain a URL for each ALTO XML file. So after running the ocrd command, additional processing like in the example is required. It would be nice if that additional processing could be avoided.

@kba kba self-assigned this Oct 12, 2023
@kba
Copy link
Member

kba commented Oct 12, 2023

Good point. I need to fix the bulk-add mechanism to be compatible with #1079, I'll implement in such a way, that it will be possible to set both the local filename and remote URL.

@kba kba closed this as completed in 7c978a2 Oct 12, 2023
kba added a commit that referenced this issue Oct 12, 2023
bulk-add: Distinguish url and local_filename, fix #1086
@stweil
Copy link
Contributor Author

stweil commented Mar 8, 2024

It took some time, but now I could use the new code in a real use case. Running ocrd bulk-add works fine as long as I provide both --url and --local-filename arguments. It then writes two mets:FLocat entries for each page, one with the file path and one with the URL. As the presentation only needs the URL, but not the file path, I then tried running the command without a --local-filename argument. I had expected that it would write only the desired mets:FLocat entry with the URL, but it wrote again the other entry, too – in this case with a completely unusable file path:

    <mets:fileGrp USE="FULLTEXT">
      <mets:file ID="file-alto-idp251240016" MIMETYPE="text/xml">
        <mets:FLocat xlink:href="      &lt;mets:div ID=&quot;struct-physical-idp251240016&quot; CONTENTIDS=&quot;http:/diglib.hab.de/drucke/li-1876-1/start.htm?image=00001&quot; TYPE=&quot;page&quot; ORDER=&quot;1&quot;&gt;" LOCTYPE="OTHER" OTHERLOCTYPE="FILE"/>
        <mets:FLocat xlink:href="https://ub-backup.bib.uni-mannheim.de/~stweil/d-gt/data/DE-23/urn_nbn_de_gbv_23-drucke_li-1876-12/alto/00001.xml" LOCTYPE="URL"/>
      </mets:file>
    [...]

Would it be reasonable to change that, so that only a URL entry is added if only the --url argument is given?

@bertsky
Copy link
Collaborator

bertsky commented Mar 8, 2024

Note: you can do that with mm-update, too. (It's what we use in the OCR-D Manager.

But I am also for more flexibility of bulk-add. Also mentioned in #1150. (And somewhat related: #1179.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants