-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored validate-refs
to use new OpenSearch Serverless and registry-common library
#906
Conversation
Stuck at a bug that requires validate pom include jakarta. If we are moving validate-refs to harvest (my recommendation because they share a large amount of code via config) then this would be the time to do it. |
@al-niessner let's maybe move forward with this adding this JAR, and then we can create a ticket down the road to refactor, as needed. I like this validate-refs being in the same package as validate because users don't want to download or care to download 2 packages to do "validation". But we can climb that tree when we get to it. |
@al-niessner also looks like the tests are failing here. |
This cannot build until we update registry-common and harvest. Even if it is complete, it cannot work here or in the wild until those repos are updated. Currently, it only works in my deve environment. |
@al-niessner I think we have registry-common merged and some data should be in the AWS OpenSearch Serverless. Do we want to get back onto this task? |
Sure. Still stuck with the big question of do we move reference checker to harvest since it shares the configuration or make validate artifact dependent on harvest artifact. I still suggest moving reference checker to harvest since it has zero dependencies on validate. I understand the grouping issue. I suggest solving that with a a pseudo artifact that installs many tools in some grouping rather than actually trying to group stuff by use rather than software dependency. |
Moved to using registry-common and harvest for the tools they provide. harvest provides the configuration reader while the registry-common provides the interface to any opensearch. Updated the logic to use the new tools which made the unit tests invalid. Removed all of the unit test code for RI since it is no longer applicable. The logic of RI has remained. What has changed is how the interface to opensearch is used.
validate-refs
to use new OpenSearch Serverless and registry-common
validate-refs
to use new OpenSearch Serverless and registry-commonvalidate-refs
to use new OpenSearch Serverless and registry-common library
@al-niessner looks like this doesn't compile.
|
Harvest pom?
…On Thu, Sep 26, 2024, 11:33 Jordan Padams ***@***.***> wrote:
@al-niessner <https://github.com/al-niessner> looks like this doesn't
compile.
Compilation failure
[ERROR] /Users/jpadams/proj/pds/pdsen/workspace/validate/src/main/java/gov/nasa/pds/validate/ri/AuthInformation.java:[6,32] package gov.nasa.pds.harvest.cfg does not exist
—
Reply to this email directly, view it on GitHub
<#906 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIUBISS2HCKUK6VGJ6WRV3ZYRHN3AVCNFSM6AAAAABIF7N5KSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXGY2TSMRTHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
You were not at the latest versions. Updated but now buildbot cannot find the latest versions of registry-common (2.1.0) or harvest (4.1.0). Get those out and you should be alright. |
@al-niessner now that I figured out how to get it to compile once we update registry-common and harvest, how do I run this? do I need to give it a harvest config? can I still give it what I gave it before (a txt file with a URL and a path to an auth file)? |
The
|
User must now supply the conneciton URL and auth file.
Should work with -a and -r now. I cleaned up the CLI help too. Yanked dependency on harvest from POM. Good luck. |
Still not stable. I do not understand this at all; meaning, it is more fun than usual because it is so challenging. I am doing this locally |
@al-niessner I think it is just |
It is ready for your testing again. I have no idea what secret it thinks we have revealed, but cannot believe there is anything really there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good:
[WARNING] Tests run: 260, Failures: 0, Errors: 0, Skipped: 23
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 09:12 min
[INFO] Finished at: 2024-10-31T13:18:40-05:00
[INFO] ------------------------------------------------------------------------
@al-niessner I will update this secrets baseline, but here are the instructions for recreating and auditing the secrets baseline. |
FYI, the instructions are here: https://github.com/NASA-PDS/nasa-pds.github.io/wiki/Git-and-Github-Guide#detect-secrets |
Thanks for the link. I clicked on the one in the actions console and it gave me a 404. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@al-niessner a few issues arose when trying to test this:
- As a very novice user, I'm not sure what this value is supposed to be:
-r,--registry-connection <registry-connection> URL point to the
registry connection
information usually of
the form
app://connection/direct/
localhost.xml
Is this the AOSS URL? Does it need to contain the path to the specific registry/registries that I want to query again? Is there any way we can default this to our production registry URL, and the end user could just overwrite if they are testing or want to use something else?
- When I gave it what I thought I was supposed to put there, I get an unhandled exception:
12:09:13.232 [main] FATAL gov.nasa.pds.validate.ri.OpensearchDocument - Error reading from URL: uninitialized connection factory
jakarta.xml.bind.UnmarshalException: null
at org.glassfish.jaxb.runtime.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:221) ~[jaxb-runtime-4.0.2.jar:4.0.2 - d104f19]
at org.glassfish.jaxb.runtime.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:189) ~[jaxb-runtime-4.0.2.jar:4.0.2 - d104f19]
at jakarta.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:146) ~[jakarta.xml.bind-api-4.0.1.jar:4.0.1]
at jakarta.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:151) ~[jakarta.xml.bind-api-4.0.1.jar:4.0.1]
at jakarta.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:161) ~[jakarta.xml.bind-api-4.0.1.jar:4.0.1]
at gov.nasa.pds.registry.common.connection.RegistryConnectionContent.from(RegistryConnectionContent.java:26) ~[registry-common-2.1.0-20241024.032014-5.jar:?]
at gov.nasa.pds.registry.common.EstablishConnectionFactory.from(EstablishConnectionFactory.java:19) ~[registry-common-2.1.0-20241024.032014-5.jar:?]
at gov.nasa.pds.registry.common.EstablishConnectionFactory.from(EstablishConnectionFactory.java:15) ~[registry-common-2.1.0-20241024.032014-5.jar:?]
at gov.nasa.pds.validate.ri.AuthInformation.getConnectionFactory(AuthInformation.java:30) ~[validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ri.OpensearchDocument.load(OpensearchDocument.java:25) [validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ri.OpensearchDocument.exists(OpensearchDocument.java:81) [validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ri.Cylinder.run(Cylinder.java:40) [validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ri.Engine.processQueueUntilEmpty(Engine.java:69) [validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ri.CommandLineInterface.process(CommandLineInterface.java:120) [validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ReferenceIntegrityMain.main(ReferenceIntegrityMain.java:14) [validate-3.6.0-SNAPSHOT.jar:?]
Caused by: java.io.FileNotFoundException: https://b3rqys09xmx9i19yn64i.us-west-2.aoss.amazonaws.com
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:2032) ~[?:?]
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1625) ~[?:?]
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224) ~[?:?]
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) ~[xercesImpl-2.12.2.jar:2.12.2]
at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source) ~[xercesImpl-2.12.2.jar:2.12.2]
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.glassfish.jaxb.runtime.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:218) ~[jaxb-runtime-4.0.2.jar:4.0.2 - d104f19]
... 14 more
12:09:13.237 [main] ERROR Reference Integrity - The given lidvid 'urn:jaxa:darts:hyb2::2.0' is missing from the database.
12:09:13.349 [Thread-0] ERROR gov.nasa.pds.validate.ri.CommandLineInterface - Had an error communicating with opensearch
jakarta.xml.bind.UnmarshalException: null
at org.glassfish.jaxb.runtime.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:221) ~[jaxb-runtime-4.0.2.jar:4.0.2 - d104f19]
at org.glassfish.jaxb.runtime.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:189) ~[jaxb-runtime-4.0.2.jar:4.0.2 - d104f19]
at jakarta.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:146) ~[jakarta.xml.bind-api-4.0.1.jar:4.0.1]
at jakarta.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:151) ~[jakarta.xml.bind-api-4.0.1.jar:4.0.1]
at jakarta.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:161) ~[jakarta.xml.bind-api-4.0.1.jar:4.0.1]
at gov.nasa.pds.registry.common.connection.RegistryConnectionContent.from(RegistryConnectionContent.java:26) ~[registry-common-2.1.0-20241024.032014-5.jar:?]
at gov.nasa.pds.registry.common.EstablishConnectionFactory.from(EstablishConnectionFactory.java:19) ~[registry-common-2.1.0-20241024.032014-5.jar:?]
at gov.nasa.pds.registry.common.EstablishConnectionFactory.from(EstablishConnectionFactory.java:15) ~[registry-common-2.1.0-20241024.032014-5.jar:?]
at gov.nasa.pds.validate.ri.AuthInformation.getConnectionFactory(AuthInformation.java:30) ~[validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ri.DuplicateFileAreaFilenames.findDuplicates(DuplicateFileAreaFilenames.java:44) [validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ri.DuplicateFileAreaFilenames.run(DuplicateFileAreaFilenames.java:36) [validate-3.6.0-SNAPSHOT.jar:?]
at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: java.io.FileNotFoundException: https://b3rqys09xmx9i19yn64i.us-west-2.aoss.amazonaws.com
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:2032) ~[?:?]
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1625) ~[?:?]
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224) ~[?:?]
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) ~[xercesImpl-2.12.2.jar:2.12.2]
at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source) ~[xercesImpl-2.12.2.jar:2.12.2]
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.glassfish.jaxb.runtime.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:218) ~[jaxb-runtime-4.0.2.jar:4.0.2 - d104f19]
... 11 more
12:09:13.554 [main] FATAL gov.nasa.pds.validate.ri.OpensearchDocument - Error reading from URL: uninitialized connection factory
jakarta.xml.bind.UnmarshalException: null
at org.glassfish.jaxb.runtime.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:221) ~[jaxb-runtime-4.0.2.jar:4.0.2 - d104f19]
at org.glassfish.jaxb.runtime.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:189) ~[jaxb-runtime-4.0.2.jar:4.0.2 - d104f19]
at jakarta.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:146) ~[jakarta.xml.bind-api-4.0.1.jar:4.0.1]
at jakarta.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:151) ~[jakarta.xml.bind-api-4.0.1.jar:4.0.1]
at jakarta.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:161) ~[jakarta.xml.bind-api-4.0.1.jar:4.0.1]
at gov.nasa.pds.registry.common.connection.RegistryConnectionContent.from(RegistryConnectionContent.java:26) ~[registry-common-2.1.0-20241024.032014-5.jar:?]
at gov.nasa.pds.registry.common.EstablishConnectionFactory.from(EstablishConnectionFactory.java:19) ~[registry-common-2.1.0-20241024.032014-5.jar:?]
at gov.nasa.pds.registry.common.EstablishConnectionFactory.from(EstablishConnectionFactory.java:15) ~[registry-common-2.1.0-20241024.032014-5.jar:?]
at gov.nasa.pds.validate.ri.AuthInformation.getConnectionFactory(AuthInformation.java:30) ~[validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ri.OpensearchDocument.load(OpensearchDocument.java:25) [validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ri.OpensearchDocument.exists(OpensearchDocument.java:81) [validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ri.Cylinder.has_children(Cylinder.java:23) [validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ri.Cylinder.run(Cylinder.java:55) [validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ri.Engine.processQueueUntilEmpty(Engine.java:69) [validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ri.CommandLineInterface.process(CommandLineInterface.java:120) [validate-3.6.0-SNAPSHOT.jar:?]
at gov.nasa.pds.validate.ReferenceIntegrityMain.main(ReferenceIntegrityMain.java:14) [validate-3.6.0-SNAPSHOT.jar:?]
Caused by: java.io.FileNotFoundException: https://b3rqys09xmx9i19yn64i.us-west-2.aoss.amazonaws.com
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:2032) ~[?:?]
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1625) ~[?:?]
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224) ~[?:?]
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) ~[xercesImpl-2.12.2.jar:2.12.2]
at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source) ~[xercesImpl-2.12.2.jar:2.12.2]
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
at org.glassfish.jaxb.runtime.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:218) ~[jaxb-runtime-4.0.2.jar:4.0.2 - d104f19]
... 15 more
Would it help to say, "The value, not any of the attributes, of |
@al-niessner copy that. could we use the same config registry-mgr uses instead? From @tloubrieu-jpl instructions here, this looks like some XML config file, which makes sense since it really needs to be run with Cognito in the loop in order to authorize access, no? |
The value of Now that you know how it is all tied together, what words would be helpful in the help? |
@al-niessner right. I think I am using the right URL for one of our registries, and it is not working. Were you able to successfully test with a dev or production registry? |
Can you post or email me your command line? I will see if I can figure out what is wrong. |
The solution is not complete because it does not check for the products in all the nodes registry. |
Do we not have an index that maps to all of the indices? Would we prefer "index_a,index_b,index_c" or "index_a;index_b;index_c" or either? |
@al-niessner The problem here is the current AOSS auth layer does not support this. I may be able to test this with an admin user, but a regular node user would not be able to do this without a change to that layer. blocked by NASA-PDS/registry#348 |
I'm going to merge this because it kind of works, but it requires the entire bundle and all it's references be loaded into the same registry index, which is not how the system is architected. updated original comment to no longer resolve #621, since it doesn't really work as intended |
🗒️ Summary
Use registry-common for access to opensearch via Java SDK v2
⚙️ Test Data and/or Report
Had to remove the automated RI checks that did not seem to work anyway (they were causing errors but ignored by checkers) because need a mock for registry-common ConnectionFactory and RestClient. Loaded a DB by hand using ref data repository then ran RI. It ran to completion with no unexpected errors.
♻️ Related Issues
Alters #1053
Closes #895
Refs #621
Depends on NASA-PDS/registry-common#57
depends on NASA-PDS/registry-common#89
depends on NASA-PDS/harvest#191