Use Solr OCR Highlighting Plugin in Search in Document Plugin #587

beatrycze-volk · 2021-02-11T15:30:21Z

The main aim of this PR is to implement Solr OCR Highlighting Plugin inside the existing Search in Document Plugin.

Tasks:

Changes which should be discussed:

Access-Control-Allow-Origin
Currently in PageViewProxy it is set. It causes problem when this header is also configured in .htaccess.
replace of submit button with standard function button in search form
For unknown reason submit form button with prevent page reload was not working for the partner, so it got changes to standard button which calls a method. It work for the partner now, but it has some implications for e.g. pressing Enter after writing search term simply submits form without executing search function. One of the possible solutions would be also map the key to the function, but it is not sure that there are no other implications not detected currently.

albig · 2021-02-11T15:57:54Z

What do I have to do to install/activate the OCR Highlighting Plugin? Does the code work, if it's not installed?

beatrycze-volk · 2021-02-11T16:18:56Z

What do I have to do to install/activate the OCR Highlighting Plugin? Does the code work, if it's not installed?

https://dbmdz.github.io/solr-ocrhighlighting/installation/

I would assume that it doesn't work if not installed. I have only tested with SOLR index which had the plugin installed.

albig · 2021-02-11T16:33:48Z

What do I have to do to install/activate the OCR Highlighting Plugin? Does the code work, if it's not installed?

https://dbmdz.github.io/solr-ocrhighlighting/installation/

I would assume that it doesn't work if not installed. I have only tested with SOLR index which had the plugin installed.

Ok. Than we have to include this information to the documentation.

And I will setup a testsystem with this Solr plugin. May take some days.

beatrycze-volk · 2021-03-15T14:50:24Z

Should I mark it as ready to merge or we also decide that some changes can't be merged in and I will create new branch in which I will implement new stuff without changing old ones (I mean way of submitting the form and cross origin access)?

albig · 2021-03-26T17:14:17Z

https://dbmdz.github.io/solr-ocrhighlighting/installation/

I would assume that it doesn't work if not installed. I have only tested with SOLR index which had the plugin installed.

You are right. It doesn't work out-of-the box. The search plugin still finds no snippets of course.The searchInDocument fails with exception.

I still don't like the idea to make the solr-ocrhighlighting plugin obligatory. And I will investigate howto upgrade an existing installation.

albig

I didn't manage to get the Solr working with

Solr 8.8
solr-ocrhighlighting 0.5.0 (current release)
schema.xml and solrconfig.xml from this PR
updated extension configuration for solr (using default values)

On indexing a document, I end up with the following error

org.apache.solr.common.SolrException: Exception writing document id 36LOG_0003 to the index; possible analysis error.
	at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:256)
	at org.apache.solr.update.processor.RunUpdateProcessorFactory$RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:73)
...
Caused by: java.lang.RuntimeException: Could not parse pointer: 
	at de.digitalcollections.solrocr.model.SourcePointer.parse(SourcePointer.java:81)
	at de.digitalcollections.solrocr.lucene.filters.ExternalUtf8ContentFilterFactory.create(ExternalUtf8ContentFilterFactory.java:42)
	at org.apache.solr.analysis.TokenizerChain.initReader(TokenizerChain.java:97)
...

Please check again and tell me your system setup to reproduce.

Classes/Common/SolrSearchResult/ResultDocument.php

Configuration/ApacheSolr/configsets/dlf/conf/solrconfig.xml

albig · 2021-03-31T12:25:00Z

Classes/Plugin/Tools/SearchInDocumentTool.php

+    {
+        $id = $this->doc->uid;
+
+        if (!empty($this->conf['documentIdUrlSchema'])) {


What's the purpose of this? Could you please provide an example string to verify the string splitting magic below?

Sometimes id of the document is actually not an id but a link to xml file. Then to search in the index the full link can't be used so it is needed to extract real id out of the full link.

Example:
https://dev-ddb.fiz-karlsruhe.de/api/items/F45KF24U7UIIVCUJJBV334PD7IPE7JBW/source/record -> https://dev-ddb.fiz-karlsruhe.de/api/items/*id*/source/record

I think we need to distinguish more cases here:

The normal case would be documents which are properly indexed in the TYPO3 database. Those don't need any special treatment for their ID anyway.

The "DFG-Viewer" case would be documents which are not indexed in the database nor in the Solr index. Normally those can be identified by the fact that they have an URL as ID.

Then there is the special "Zeitungsportal" case where we have documents, which are not indexed in the database, but are indexed in the Solr index. Those have an URL as ID as well, but need special treatment to extract the indexed ID from the URL.

I suggest using the TYPO3 hook mechanism to handle all those cases:

If a document has a numerical ID, no further treatment is necessary.

If a document has a string ID, this string is processed through a hook.

If the hook returns an integer, the document is treated as indexed.

If the hook returns the string unchanged, the document is considered unindexed and ignored when searching. (This would be the default if no hook is registered, e. g. in "DFG-Viewer mode".)

The advantages would be:

We don't need to add Zeitungportal-specific logic to the main extension. Instead the hook can be part of the supplemental Zeitungsportal extension.

The solution is extensible, because you can register more than one hook to handle different kinds of identifiers. So anybody with a setup of a pre-existing Solr index could map their document IDs to the index IDs and re-use their Solr index. (This could even be interesting for SLUB, when we think about consolidating our different indices - CMR, Find, Kitodo, ...)

Processing the ID string wouldn't be limited to splitting a string, because a hook could include more complex logic like regular expressions or even database lookups.

I suggest using the TYPO3 hook mechanism to handle all those cases:

Do you mean https://typo3.org/article/how-to-use-existing-hooks-in-your-own-extension ?

If a document has a string ID, this string is processed through a hook.
If the hook returns an integer, the document is treated as indexed.
If the hook returns the string unchanged, the document is considered unindexed and ignored when searching. (This would be the default if no hook is registered, e. g. in "DFG-Viewer mode".)

But how to convert string ID to an integer?

We actually already have some hooks in Kitodo.Presentation. A good example is the way how we extract the record ID from METS:
Normally we would take /mets:mets[@OBJID] as the document's unique record identifier:

kitodo-presentation/Classes/Common/MetsDocument.php

Lines 147 to 150 in f12eede

// Check for METS object @ID.

if (!empty($this->mets['OBJID'])) {

$this->recordId = (string) $this->mets['OBJID'];

}

But if this attribute is not set (which is the case for old versions of Kitodo.Production), we use hooks to have a fallback. The hook is registered in ext_localconf.php:

kitodo-presentation/ext_localconf.php

Line 219 in f12eede

$GLOBALS['TYPO3_CONF_VARS']['SC_OPTIONS']['dlf/Classes/Common/MetsDocument.php']['hookClass'][] = \Kitodo\Dlf\Hooks\KitodoProductionHacks::class;

and called in Classes/Common/MetsDocument.php:

kitodo-presentation/Classes/Common/MetsDocument.php

Lines 151 to 158 in f12eede

// Get hook objects.

$hookObjects = Helper::getHookObjects('Classes/Common/MetsDocument.php');

// Apply hooks.

foreach ($hookObjects as $hookObj) {

if (method_exists($hookObj, 'construct_postProcessRecordId')) {

$hookObj->construct_postProcessRecordId($this->xml, $this->recordId);

}

}

The hook itself can be found in Classes/Hooks/KitodoProductionHacks.php:

kitodo-presentation/Classes/Hooks/KitodoProductionHacks.php

Lines 37 to 68 in f12eede

public function construct_postProcessRecordId(\SimpleXMLElement &$xml, &$record_id)

{

if (!$record_id) {

$xml->registerXPathNamespace('mods', 'http://www.loc.gov/mods/v3');

// Get all logical structure nodes with metadata, but without associated METS-Pointers.

if (($divs = $xml->xpath('//mets:structMap[@TYPE="LOGICAL"]//mets:div[@DMDID and not(./mets:mptr)]'))) {

$smLinks = $xml->xpath('//mets:structLink/mets:smLink');

if (!empty($smLinks)) {

foreach ($smLinks as $smLink) {

$links[(string) $smLink->attributes('http://www.w3.org/1999/xlink')->from][] = (string) $smLink->attributes('http://www.w3.org/1999/xlink')->to;

}

foreach ($divs as $div) {

if (!empty($links[(string) $div['ID']])) {

$id = (string) $div['DMDID'];

break;

}

}

}

if (empty($id)) {

$id = (string) $divs[0]['DMDID'];

}

$dmdIds = explode(' ', $id);

foreach ($dmdIds as $dmdId) {

$recordIds = $xml->xpath('//mets:dmdSec[@ID="' . $dmdId . '"]//mods:mods/mods:recordInfo/mods:recordIdentifier');

if (!empty($recordIds)) {

$record_id = (string) $recordIds[0];

break;

}

}

}

}

}

Hook is a possibility. I think the modern way would be to make some "middleware". But I haven't done this before. Hook support itself seem to be not deprecated, although lots of core hooks are.

The DFG-Viewer case is not relevant in the SearchInDocument tool. This tool won't work with it anyway. The DFG-Viewer has it's own SRU search plugin, which requires a SRU server of course.

I suggest to first test and documentate this PR including indexing (doesn't work yet) and searching for Kitodo.Presentation alone. If this works, we can discuss further at this point.

Please add at least a sample of the documentIdUrlSchema setting. Or is this part of the documentation?

This should be already part of #630. But I still need some (inline) documentation about what is going on here, why we are doing this and what string you expect in documentIdUrlSchema.

beatrycze-volk · 2021-04-14T16:33:39Z

@albig: Is your error similar to something like that:

org.apache.solr.common.SolrException: Exception writing document id ocrdoc-1 to the index; possible analysis error: Illegal char <:> at index 5: https://digital.slub-dresden.de/data/kitodo/TheDarea_416971482-19100223/TheDarea_416971482-19100223_ocr/00000001.xml at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:247) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:76)

?

After research I have found out that indexing when XML files are stored in data folder of the core works correctly but when they are passed from external URL the error appears.

I have checked in the GItHub page of the plugin and found dbmdz/solr-ocrhighlighting#49:

Currently we only support loading field values/OCR fragments from the file system.

beatrycze-volk · 2021-04-14T18:20:51Z

I'm sorry for this few rebase pushes, I have made a bit of mess and needed to fix it. Now all commits should be correct.

albig · 2021-04-15T08:02:19Z

@albig: Is your error similar to something like that:

org.apache.solr.common.SolrException: Exception writing document id ocrdoc-1 to the index; possible analysis error: Illegal char <:> at index 5: https://digital.slub-dresden.de/data/kitodo/TheDarea_416971482-19100223/TheDarea_416971482-19100223_ocr/00000001.xml at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:247) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:76)

?

After research I have found out that indexing when XML files are stored in data folder of the core works correctly but when they are passed from external URL the error appears.

I have checked in the GItHub page of the plugin and found dbmdz/solr-ocrhighlighting#49:

Currently we only support loading field values/OCR fragments from the file system.

Yes, I think this was the error I've seen. So we have to wait for the fix in the ocrhighlighting plugin?

beatrycze-volk · 2021-04-15T11:34:02Z

Yes, I think this was the error I've seen. So we have to wait for the fix in the ocrhighlighting plugin?

Yes, of course if they are going to fix it actually. That's not so sure after reading the conversation in this issue.

Other possiblity is to store content of the xml files instead pointers to those files.

beatrycze-volk · 2021-08-16T11:37:38Z

So, the text highlighting only works with the SearchInDocument plugin used together with the pageview plugin AND the TypoScript configuration for the "Toolbox" must be included.

Currently yes. There is also possibility to implement it in the way that coordinates will be already passed in URL parameter.

Nevertheless, it's still not working. I'll add some remarks in the following review on the right lines.

Thanks!

albig

Please have look at my comments.

Resources/Public/Javascript/PageView/SearchInDocument.js

albig · 2021-08-16T11:44:39Z

Resources/Public/Javascript/PageView/SearchInDocument.js

+    for(var i = 0; i < queryParams.length; i++) {
+        var queryParam = queryParams[i].split('=');
+
+        if(queryParam[0] === $("input[id='tx-dlf-search-in-document-page']").attr('name')) {


This comparison doesn't respect the possible encoding of the parameters. This is wrong on other places, too.

If you click next/last page, the link parameter get's encoded:
/workview?tx_dlf%5Bdouble%5D=0&tx_dlf%5Bhighlight_word%5D=pieschen&tx_dlf%5Bid%5D=1&tx_dlf%5Bpage%5D=3&cHash=04b8a699974161fd87b83c9b36d65483

But 'tx_dlf%5Bpage%5D' === 'tx_dlf[page]' is not true so the highlighting fails.

Highlighting seems to work now. Thank you!

albig

With the patched ocr-highlighting plugin, I see the toplevel element. But there are some issues with the facets and the filter queries. Please see my comments.

Classes/Common/DocumentList.php

albig · 2021-08-17T15:04:06Z

Classes/Common/DocumentList.php

+        foreach ($filterQueries as $key => $value) {
+            if (isset($value['query'])) {
+                $filterQuery[$key] = $value['query'] . ' OR ' . $fields['toplevel'] . ':true';
+                $query->addFilterQuery($filterQuery);


This won't work either. But I found no way to test this. I think, there is another issue in Search:makeFacetsMenuArray(). So I can't get the available facets from Solr and show the facets menu.

I have taken look to this method. Do you think that it needs to be adjusted now or it can be done later?

The addFilterQuery() above has to be fixed. But before doing this, I'm looking for working facets. Currently, the Solr has no facets available and I still don't know, why.

A query like

/solr/dlfCore0/select?facet=on&q=*%3A*&rows=0

Should return something like

{ "responseHeader":{ "status":0, "QTime":2, "params":{ "q":"*:*", "rows":"0", "facet":"on"}}, "response":{"numFound":4517,"start":0,"numFoundExact":true,"docs":[] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{}, "facet_ranges":{}, "facet_intervals":{}, "facet_heatmaps":{}}}

In my case - with this PR working - I get

{ "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"*:*", "rows":"0", "facet":"on"}}, "response":{"numFound":4376,"start":0,"numFoundExact":true,"docs":[] }}

That's why no facetting is shown.

The faceting is solved (see other commit suggest).

With selecting a facet, we get the (expected) exception:

"A filterquery must have a key value" in Line 283.

The used GET-parameter are:

tx_dlf[fq][0]=collection_faceting:("Projekt: Illustrierte Magazine der Klassischen Moderne")&tx_dlf[fq][1]=type_faceting:("volume")&tx_dlf[query]=*

So please rewrite Line 283.

Classes/Common/DocumentList.php

Co-authored-by: Alexander Bigga <[email protected]>

Classes/Common/Indexer.php

This line is not needed.

albig

Maybe the last change request for now. Please have a look at the comments.

Configuration/ApacheSolr/configsets/dlf/conf/solrconfig.xml

albig · 2021-08-18T12:39:29Z

Classes/Common/DocumentList.php

+        foreach ($filterQueries as $key => $value) {
+            if (isset($value['query'])) {
+                $filterQuery[$key] = $value['query'] . ' OR ' . $fields['toplevel'] . ':true';
+                $query->addFilterQuery($filterQuery);


The faceting is solved (see other commit suggest).

With selecting a facet, we get the (expected) exception:

"A filterquery must have a key value" in Line 283.

The used GET-parameter are:

tx_dlf[fq][0]=collection_faceting:("Projekt: Illustrierte Magazine der Klassischen Moderne")&tx_dlf[fq][1]=type_faceting:("volume")&tx_dlf[query]=*

So please rewrite Line 283.

Classes/Plugin/Eid/SearchInDocument.php

Co-authored-by: Alexander Bigga <[email protected]>

beatrycze-volk · 2021-08-18T13:44:20Z

With selecting a facet, we get the (expected) exception:

"A filterquery must have a key value" in Line 283.

Is $key not a key value?

Classes/Common/DocumentList.php

Co-authored-by: Alexander Bigga <[email protected]>

albig · 2021-08-19T16:33:35Z

I think we finally reached the state, all features seems to work and I can have a look at the performance of the new ocr highlighting plugin and about the "migration" path from last release to next release containing this feature. So maybe, there is coming another change request - or the final merge in new future.

albig · 2021-09-06T15:42:58Z

Just to give an update.

Indexing with the same set of documents seems to work with this PR. But there are missing some pages. Obviously the ocrhighlight plugin doesn't like pages with the <Illustration> tag.

Example:
https://digital.slub-dresden.de/data/kitodo/lebe_364572701-19320800/lebe_364572701-19320800_ocr/00000068.xml

at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Failed to parse the OCR markup, make sure your files are well-formed and your regions start/end on complete tags! (Source was: [unknown])
	at de.digitalcollections.solrocr.formats.OcrParser.(OcrParser.java:109)
	at de.digitalcollections.solrocr.formats.alto.AltoParser.(AltoParser.java:27)
	at de.digitalcollections.solrocr.formats.alto.AltoFormat.getParser(AltoFormat.java:38)
	at de.digitalcollections.solrocr.model.OcrFormat.filter(OcrFormat.java:90)
	at de.digitalcollections.solrocr.lucene.filters.OcrCharFilterFactory.create(OcrCharFilterFactory.java:51)

The other point is, that the index size is 3 times (!) bigger than before.

albig · 2021-09-20T15:47:13Z

Just to give an update.

I'm again on this PR. Just a new update.

Indexing with the same set of documents seems to work with this PR. But there are missing some pages. Obviously the ocrhighlight plugin doesn't like pages with the <Illustration> tag.

It has been to very long <Polygon> element. See dbmdz/solr-ocrhighlighting#210. This has been fixed.

The other point is, that the index size is 3 times (!) bigger than before.

Storing the whole ALTO is not that nice. This PR ignores additionally the ALTO namespace. Not sure, if we should go this way.

If we store only the suggested miniOcr-format, the index has "only" double size than before.

beatrycze-volk · 2021-09-20T15:52:52Z

Just to give an update.

I'm again on this PR. Just a new update.

Indexing with the same set of documents seems to work with this PR. But there are missing some pages. Obviously the ocrhighlight plugin doesn't like pages with the <Illustration> tag.

It has been to very long <Polygon> element. See dbmdz/solr-ocrhighlighting#210. This has been fixed.

I was actually also on this PR. Right after push my 'fix' commit I have seen this comment. Probably I will need to delete it.

The other point is, that the index size is 3 times (!) bigger than before.

Storing the whole ALTO is not that nice. This PR ignores additionally the ALTO namespace. Not sure, if we should go this way.

If we store only the suggested miniOcr-format, the index has "only" double size than before.

Ok. Would it be our approach now to convert files to miniOCR? Should it be done in our code or by some external tool?

albig · 2021-09-20T21:10:35Z

I was actually also on this PR. Right after push my 'fix' commit I have seen this comment. Probably I will need to delete it.

Yes, I think so. This should'nt be fixed on our side.

Ok. Would it be our approach now to convert files to miniOCR? Should it be done in our code or by some external tool?

I've done this in our code because this miniOCR is no standard format.

I'd suggest the following procedure to continue:

I merge Use Solr OCR Highlighting Plugin in Search in Document Plugin #587 and Remove obsolete hl parameter use in image highlighting #639 into a new feature branch add-ocrhighlighting-plugin
I'll add my prototype code for miniOCR to this branch
We can discuss the best solution in a new PR add-ocrhighlighting-plugin -> master

beatrycze-volk · 2021-09-21T09:08:18Z

I merge Use Solr OCR Highlighting Plugin in Search in Document Plugin #587 and Remove obsolete hl parameter use in image highlighting #639 into a new feature branch add-ocrhighlighting-plugin

I'll add my prototype code for miniOCR to this branch

We can discuss the best solution in a new PR add-ocrhighlighting-plugin -> master

Ok!

beatrycze-volk requested review from sebastian-meyer and albig February 11, 2021 15:30

albig added this to the Kitodo.Presentation 4.0.0 milestone Feb 11, 2021

beatrycze-volk force-pushed the use-solr-highlighting branch from 7964285 to c923341 Compare February 11, 2021 16:33

beatrycze-volk force-pushed the use-solr-highlighting branch 2 times, most recently from a32763a to c924cd1 Compare February 15, 2021 12:49

albig added the ⚙ feature A new feature or enhancement. label Feb 17, 2021

sebastian-meyer assigned beatrycze-volk Feb 17, 2021

beatrycze-volk force-pushed the use-solr-highlighting branch from c924cd1 to e236489 Compare February 22, 2021 10:10

beatrycze-volk force-pushed the use-solr-highlighting branch from 0498523 to 97ebdaa Compare March 15, 2021 15:08

beatrycze-volk marked this pull request as ready for review March 22, 2021 15:12

albig suggested changes Mar 26, 2021

View reviewed changes

Classes/Common/SolrSearchResult/ResultDocument.php Outdated Show resolved Hide resolved

Classes/Common/SolrSearchResult/ResultDocument.php Outdated Show resolved Hide resolved

Configuration/ApacheSolr/configsets/dlf/conf/solrconfig.xml Outdated Show resolved Hide resolved

beatrycze-volk force-pushed the use-solr-highlighting branch 2 times, most recently from 200fe81 to a4f5373 Compare March 29, 2021 15:46

albig reviewed Mar 31, 2021

View reviewed changes

beatrycze-volk force-pushed the use-solr-highlighting branch 3 times, most recently from 45ebbb9 to 1c7c1f3 Compare April 14, 2021 18:10

beatrycze-volk mentioned this pull request Apr 19, 2021

Solr highlighting doesn't work with certain search terms. #502

Closed

beatrycze-volk force-pushed the use-solr-highlighting branch 2 times, most recently from fbe6380 to 14c4e9f Compare April 28, 2021 15:59

albig suggested changes Aug 16, 2021

View reviewed changes

Apply review comments related to URL parameters

1582cf7

albig mentioned this pull request Aug 17, 2021

Problem with the retrieving results, when some of the results contain empty full text field dbmdz/solr-ocrhighlighting#200

Closed

albig suggested changes Aug 17, 2021

View reviewed changes

beatrycze-volk and others added 3 commits August 17, 2021 17:46

Remove overwrite of query

ee2b704

Co-authored-by: Alexander Bigga <[email protected]>

Use create filter query

11b500e

Co-authored-by: Alexander Bigga <[email protected]>

Fix the query limit

d115ef4

Co-authored-by: Alexander Bigga <[email protected]>

albig reviewed Aug 18, 2021

View reviewed changes

Classes/Common/Indexer.php Outdated Show resolved Hide resolved

Update Classes/Common/Indexer.php

8448869

This line is not needed.

albig suggested changes Aug 18, 2021

View reviewed changes

beatrycze-volk and others added 2 commits August 18, 2021 15:27

Ignore false-positive warning from Scrutinizer

399dbef

Co-authored-by: Alexander Bigga <[email protected]>

Add facet component to solrconfig.xml

d983735

Co-authored-by: Alexander Bigga <[email protected]>

albig reviewed Aug 18, 2021

View reviewed changes

Classes/Common/DocumentList.php Show resolved Hide resolved

beatrycze-volk and others added 2 commits August 18, 2021 18:55

Use array for filter queyr

11a796e

Co-authored-by: Alexander Bigga <[email protected]>

Merge branch 'master' into use-solr-highlighting

10fb2f6

Remove Illustration tag from full text XML

66bbed3

Merge branch 'master' into use-solr-highlighting

c2d6517

albig changed the base branch from master to ocr-highlighting-plugin September 21, 2021 09:07

albig modified the milestones: Kitodo.Presentation 4.0.0, Kitodo.Presentation 3.3.0 Sep 21, 2021

albig merged commit 9432644 into kitodo:ocr-highlighting-plugin Sep 21, 2021

beatrycze-volk deleted the use-solr-highlighting branch September 21, 2021 12:17

	// Check for METS object @ID.
	if (!empty($this->mets['OBJID'])) {
	$this->recordId = (string) $this->mets['OBJID'];
	}

	// Get hook objects.
	$hookObjects = Helper::getHookObjects('Classes/Common/MetsDocument.php');
	// Apply hooks.
	foreach ($hookObjects as $hookObj) {
	if (method_exists($hookObj, 'construct_postProcessRecordId')) {
	$hookObj->construct_postProcessRecordId($this->xml, $this->recordId);
	}
	}

	public function construct_postProcessRecordId(\SimpleXMLElement &$xml, &$record_id)
	{
	if (!$record_id) {
	$xml->registerXPathNamespace('mods', 'http://www.loc.gov/mods/v3');
	// Get all logical structure nodes with metadata, but without associated METS-Pointers.
	if (($divs = $xml->xpath('//mets:structMap[@TYPE="LOGICAL"]//mets:div[@DMDID and not(./mets:mptr)]'))) {
	$smLinks = $xml->xpath('//mets:structLink/mets:smLink');
	if (!empty($smLinks)) {
	foreach ($smLinks as $smLink) {
	$links[(string) $smLink->attributes('http://www.w3.org/1999/xlink')->from][] = (string) $smLink->attributes('http://www.w3.org/1999/xlink')->to;
	}
	foreach ($divs as $div) {
	if (!empty($links[(string) $div['ID']])) {
	$id = (string) $div['DMDID'];
	break;
	}
	}
	}
	if (empty($id)) {
	$id = (string) $divs[0]['DMDID'];
	}
	$dmdIds = explode(' ', $id);
	foreach ($dmdIds as $dmdId) {
	$recordIds = $xml->xpath('//mets:dmdSec[@ID="' . $dmdId . '"]//mods:mods/mods:recordInfo/mods:recordIdentifier');
	if (!empty($recordIds)) {
	$record_id = (string) $recordIds[0];
	break;
	}
	}
	}
	}
	}

Use Solr OCR Highlighting Plugin in Search in Document Plugin #587

Use Solr OCR Highlighting Plugin in Search in Document Plugin #587

Conversation

beatrycze-volk commented Feb 11, 2021 • edited Loading

albig commented Feb 11, 2021

beatrycze-volk commented Feb 11, 2021

albig commented Feb 11, 2021

beatrycze-volk commented Mar 15, 2021 • edited Loading

albig commented Mar 26, 2021

albig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beatrycze-volk commented Apr 14, 2021

beatrycze-volk commented Apr 14, 2021

albig commented Apr 15, 2021

beatrycze-volk commented Apr 15, 2021

beatrycze-volk commented Aug 16, 2021

albig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albig Aug 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beatrycze-volk commented Aug 18, 2021

albig commented Aug 19, 2021

albig commented Sep 6, 2021

albig commented Sep 20, 2021

beatrycze-volk commented Sep 20, 2021

albig commented Sep 20, 2021

beatrycze-volk commented Sep 21, 2021

beatrycze-volk commented Feb 11, 2021 •

edited

Loading

beatrycze-volk commented Mar 15, 2021 •

edited

Loading

albig Aug 18, 2021 •

edited

Loading