Unable to search for`cassini` LDD attributes in ISS datasets #148

jordanpadams · 2024-09-16T21:22:58Z

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

When I tried to search by cassini:ISS_Specific_Attributes.cassini:image_number via the API, I get no results, when I should get many.

🕵️ Expected behavior

I expected to be able to search by this field and get a result

📜 To Reproduce

https://pds.mcp.nasa.gov/api/search/1/products?q=(cassini:ISS_Specific_Attributes.cassini:image_number%20eq%20%221454725799%22) should return 1 result: https://pds-rings.seti.org/pds4/bundles/cassini_iss_saturn//data_raw/14547xxxxx/1454725799n.xml

Same thing in Kibana Discover, no go.

🖥 Environment Info

Version of this software [e.g. vX.Y.Z]
Operating System: [e.g. MacOSX with Docker Desktop vX.Y]
...

📚 Version of Software Used

No response

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 NASA-PDS/registry-api#539

⚙️ Engineering Details

I am concerned more broadly that attributes throughout the systems are randomly unsearchable because harvest is not or was not properly creating fields in the schema prior to loading them into the index. Not sure how we can scrub this, but a sweeper may be necessary to somehow scan and fix this all the time.

🎉 Integration & Test

No response

The text was updated successfully, but these errors were encountered:

alexdunnjpl · 2024-09-17T21:10:06Z

Confirmed that document is in rms-registry and contains relevant key and value.

Confirmed that the key is missing from rms-registry _mapping (and in fact there is no mapping for any attribute referencing "cassini".

Harvest date/time is 2022-06-28T23:09:48.274461Z, so my soft assumption is that this is the result of a bug or missing feature in harvest which has since been implemented.

I would suggest re-harvesting that product and re-testing to confirm that the expected entries are added to the index mappings.

@jordanpadams this will be pretty delicate and (computationally) expensive to fix with repairkit if it isn't a fairly isolated issue, because it requires either non-noop updates to the relevant fields or deletion/reinsertion, once the mapping entries are added. The cleanest way to do it would probably be for repairkit to

iterate through the doc corpus and for each doc
- update the mappings
- flag documents requiring re-indexing using a metadata property
re-index flagged documents to a temporary index
delete flagged documents from the source index
reindex the temp index back to the source index
delete the temp index

This should be idempotent and avoid any potential for data loss, and could be run from a local env to avoid blowing out the cloud-sweeper task runtime.

What's the source for the mapping types? That DD you pointed me to a little while back?

alexdunnjpl · 2024-09-19T20:38:18Z

After a little searching, it looks like there may be a slightly-easier solution, apparently ES/OS documents are immutable, and therefore any meaningful (non-noop) update to a document will trigger a re-index of the entire document.

Ergo, it should be sufficient to add all missing properties to the index, and then write a metadata flag value (showing that the document has been checked) for all unchecked documents, with no need to play around with temporary indices.

EDIT: Yep, this is the case, tested and confirmed.

nutjob4life · 2024-09-19T21:20:26Z

Going well! 🎉 Details? See above ↑

alexdunnjpl · 2024-10-02T23:33:54Z

Implemented in index-mapping-repair with the exception of two wrinkles:

resolution of missing property mapping typename TBD (@jordanpadams please weigh in on that)

@jordanpadams @tloubrieu-jpl the sweeper queries twice - once to generate the set of missing mappings, then again to generate/write the doc updates once the mappings have been ensured. These two queries need to return consistent results, otherwise an old version of harvest could write new documents in the middle of a sweep which would get picked up in the second stage but not the first.

In that (theoretically-possible but shockingly-unlikely) event, those documents would erroneously be marked as fixed and excluded from future sweeps, and there only way to detect them would be to manually run the sweeper with the redundant-work filter disabled. Pick an option, in increasing order of rigor:

The likelihood of someone running an obsolete version of harvest at exactly the wrong time is functionally zero - don't guard against it.
Instead of filtering to "documents which haven't been swept before", apply an additional constraint of "harvest time is earlier than sweeper execution start".
Use a point-in-time search.

3 is the most-correct option, but may not be compatible with our dockerized registry, so I'd prefer to go with 2, or 1 if you're absolutely sure no-one will run a pre-2023 version of harvest at just the wrong time.

alexdunnjpl · 2024-10-03T18:05:21Z

Resolve missing types by cracking open the doc's blob, extracting the DD url, and reading it.

Cache downloaded DDs, cache cracked blobs, and avoid cracking for mappings which have already been resolved by the sweeper.

alexdunnjpl · 2024-10-03T18:08:24Z

per @jordanpadams, log earliest/latest harvest timestamp for affected files, and a unique list of harvest versions. Pull these from the docs themselves.

alexdunnjpl · 2024-10-04T01:06:34Z

Per @jordanpadams,

I think you can use the -dd indexes in the registry for tracking down these classes/attributes.

alexdunnjpl · 2024-10-10T19:16:23Z

status: implemented, in review
per @jordanpadams review/live-test postponed until next week, after the current site demos.

alexdunnjpl · 2024-10-14T20:49:51Z

status: testing against MCP rms in-progress.

The initial run got 40min and about halfway through, then AOSS throttled. I doubt this is something we care to address if sweepers initialization is the only thing that hits whatever limits are imposed.

@jordanpadams @tloubrieu-jpl ~~the query referred to in the OP now successfully hits on a single document.~~ 1830h EDIT: ~~Well, it did... it isn't appearing to now. I'll need to investigate this further.~~ EDIT 2: aaand it's working again. Probably just been some weird reindexing stuff going on.

Once it checks out, want me to run it against all the other nodes, and include all the sweepers (not just the reindexer)?

EDIT: Sweeper is exhibiting the same result-skipping behaviour as repairkit, which I should've seen coming. I'll implement the same fix as was applied there.

alexdunnjpl · 2024-10-15T00:55:16Z

For rms, problems were detected for harvest version 3.8.1, and harvest timestamps 2022-06-28 through 2024-03-13.

Logs were long due to many documents not having harvest versions and throwing warnings, so I haven't sent them through - @jordanpadams let me know if you'd like me to strip those out and send them.

alexdunnjpl · 2024-10-15T20:03:45Z

Confirmed with en that a single run is sufficient to reindex all documents. Currently running manually against all nodes, storing logs for later analysis

alexdunnjpl · 2024-10-16T16:28:04Z

Status: running against large indices appears to overload those indices. Need to figure out a way to consistently page through the documents.

Given the way it works:

the workload can be chunked without issue if the first/second query can be guaranteed to return the same result-set (this means PIT, most-likely, or sorting by harvest timestamp if that's infeasible), since any product which has yet to be processed will eventually be updated/reindexed, and any product which is updated/reindexed is guaranteed to have had its appropriate mappings created already.
if that is difficult or impossible, a naive approach which pages blindly could work iff the update generation step also checks that the mapping is present, not yielding an update if a missing mapping exists at update-creation-time

alexdunnjpl · 2024-10-17T19:34:43Z

PIT search is only available as of OpenSearch 2.4, and while AWS OpenSearch Service supports OpenSearch 2.15, AWS Serverless Collections currently uses OpenSearch 2.0

Serverless collections currently run OpenSearch version 2.0.x. As new versions are released, OpenSearch Serverless will automatically upgrade your collections to consume new features, bug fixes, and performance improvements.

so point-in-time search is not available to us at this point and a stopgap solution must be implemented.

jordanpadams · 2024-10-22T20:34:49Z

Status: continuing to test this more rigorously, and ran into some issues on production indexes. working on improving the algorithm to support this.

alexdunnjpl · 2024-10-23T23:56:33Z

Status: current run against ATM terminated as cluster is having to do a bunch of redundant work (indexing is slow, resulting in duplicated updates being written, resulting in more requests which probably affect indexing performance even if no-op)

Sweeper will be updated to pause until 95(?)% of the pending updates have processed and been reflected in the remaining hits count

ATM will require reindexing once @sjoshi-jpl is back because reasons

alexdunnjpl · 2024-10-24T01:08:32Z

Status: flow management code is tested and operational. Completing ATM sweep, will review logs with @jordanpadams before running against other nodes.

alexdunnjpl · 2024-10-25T17:06:59Z

Status: flow management code imperfect, sometimes waits for a hits change which never comes.
Next step: implement a check which stops stalling if the same hits-count is returned n times in a row. Stall functionality will need to be extracted to its own class at this point - it's getting sufficiently complicated.

tloubrieu-jpl · 2024-10-29T20:21:22Z

@alexdunnjpl is looking at how not to overload OpenSearch.

alexdunnjpl · 2024-11-05T21:04:10Z

Development complete, running successfully against ATM (which will need manual reindexing when @sjoshi-jpl returns), will run now against another larger node to fully-validate.

tloubrieu-jpl · 2024-11-07T22:11:39Z

Some bugs needed to be fixed while testing on IMG. That will be ready for merge shortly.

tloubrieu-jpl · 2024-11-12T21:22:31Z

Alex is running the sweepers in production.

alexdunnjpl · 2024-11-13T17:51:53Z

PSA is in-prog, should finish in the next day or two.

IMG will need to be re-run after migration

alexdunnjpl · 2024-11-26T21:12:06Z

@tloubrieu-jpl PSA is complete. PR is ready for review/merge - will close that out now

jordanpadams added bug Something isn't working needs:triage labels Sep 16, 2024

jordanpadams assigned jordanpadams and tloubrieu-jpl and unassigned jordanpadams Sep 16, 2024

jordanpadams added s.critical sprint-backlog B15.0 and removed needs:triage labels Sep 16, 2024

tloubrieu-jpl assigned alexdunnjpl Sep 17, 2024

alexdunnjpl transferred this issue from NASA-PDS/registry-api Oct 2, 2024

This was referenced Oct 9, 2024

Enable Support for Changing Field Types NASA-PDS/registry#230

Open

Implement reindexer sweeper #149

Merged

alexdunnjpl moved this from ToDo to ⚙ Review / QA in EN Portfolio Backlog Oct 10, 2024

alexdunnjpl moved this from Release Backlog to ⚙ Review / QA in B15.1 Oct 10, 2024

pdsen-ci added the open.1.3.0 label Oct 14, 2024

tloubrieu-jpl removed their assignment Nov 19, 2024

alexdunnjpl closed this as completed in #149 Nov 26, 2024

github-project-automation bot moved this from ⚙ Review / QA to 🏁 Done in B15.1 Nov 26, 2024

github-project-automation bot moved this from ⚙ Review / QA to 🏁 Done in EN Portfolio Backlog Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to search for`cassini` LDD attributes in ISS datasets #148

Unable to search for`cassini` LDD attributes in ISS datasets #148

jordanpadams commented Sep 16, 2024 •

edited

Loading

alexdunnjpl commented Sep 17, 2024

alexdunnjpl commented Sep 19, 2024 •

edited

Loading

nutjob4life commented Sep 19, 2024

alexdunnjpl commented Oct 2, 2024

alexdunnjpl commented Oct 3, 2024 •

edited

Loading

alexdunnjpl commented Oct 3, 2024 •

edited

Loading

alexdunnjpl commented Oct 4, 2024

alexdunnjpl commented Oct 10, 2024

alexdunnjpl commented Oct 14, 2024 •

edited

Loading

alexdunnjpl commented Oct 15, 2024

alexdunnjpl commented Oct 15, 2024

alexdunnjpl commented Oct 16, 2024

alexdunnjpl commented Oct 17, 2024

jordanpadams commented Oct 22, 2024

alexdunnjpl commented Oct 23, 2024

alexdunnjpl commented Oct 24, 2024

alexdunnjpl commented Oct 25, 2024

tloubrieu-jpl commented Oct 29, 2024

alexdunnjpl commented Nov 5, 2024

tloubrieu-jpl commented Nov 7, 2024

tloubrieu-jpl commented Nov 12, 2024

alexdunnjpl commented Nov 13, 2024 •

edited

Loading

alexdunnjpl commented Nov 26, 2024 •

edited

Loading

Unable to search forcassini LDD attributes in ISS datasets #148

Unable to search forcassini LDD attributes in ISS datasets #148

Comments

jordanpadams commented Sep 16, 2024 • edited Loading

Checked for duplicates

🐛 Describe the bug

🕵️ Expected behavior

📜 To Reproduce

🖥 Environment Info

📚 Version of Software Used

🩺 Test Data / Additional context

🦄 Related requirements

⚙️ Engineering Details

🎉 Integration & Test

alexdunnjpl commented Sep 17, 2024

alexdunnjpl commented Sep 19, 2024 • edited Loading

nutjob4life commented Sep 19, 2024

alexdunnjpl commented Oct 2, 2024

alexdunnjpl commented Oct 3, 2024 • edited Loading

alexdunnjpl commented Oct 3, 2024 • edited Loading

alexdunnjpl commented Oct 4, 2024

alexdunnjpl commented Oct 10, 2024

alexdunnjpl commented Oct 14, 2024 • edited Loading

alexdunnjpl commented Oct 15, 2024

alexdunnjpl commented Oct 15, 2024

alexdunnjpl commented Oct 16, 2024

alexdunnjpl commented Oct 17, 2024

jordanpadams commented Oct 22, 2024

alexdunnjpl commented Oct 23, 2024

alexdunnjpl commented Oct 24, 2024

alexdunnjpl commented Oct 25, 2024

tloubrieu-jpl commented Oct 29, 2024

alexdunnjpl commented Nov 5, 2024

tloubrieu-jpl commented Nov 7, 2024

tloubrieu-jpl commented Nov 12, 2024

alexdunnjpl commented Nov 13, 2024 • edited Loading

alexdunnjpl commented Nov 26, 2024 • edited Loading

Unable to search for`cassini` LDD attributes in ISS datasets #148

Unable to search for`cassini` LDD attributes in ISS datasets #148

jordanpadams commented Sep 16, 2024 •

edited

Loading

alexdunnjpl commented Sep 19, 2024 •

edited

Loading

alexdunnjpl commented Oct 3, 2024 •

edited

Loading

alexdunnjpl commented Oct 3, 2024 •

edited

Loading

alexdunnjpl commented Oct 14, 2024 •

edited

Loading

alexdunnjpl commented Nov 13, 2024 •

edited

Loading

alexdunnjpl commented Nov 26, 2024 •

edited

Loading