Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HED Validator integration into bids schema based validator. #1648

Merged
merged 46 commits into from
Jul 31, 2024

Conversation

rwblair
Copy link
Member

@rwblair rwblair commented Mar 27, 2023

For #1639.

bids-validator/src/deps/hed-validatordependency was generated by runningnpm run build` on latest commit for hed-javascript:
hed-standard/hed-javascript@bba927b

Two main functions located in bids-validator/src/validators/hed.ts are hedAccumulator and hedValidate

hedAccumulator is called once per context like other checks (filename validation, schema checks) in bids-validator/src/validators/bids.ts. It catches any events.tsv files and json files and builds up an object to be passed to the actual validator call.

Instead of recreating the old data structures that were passed into bids-validator/validators/events/hed.js I tried to get everything in shape a bit closer to where the outgoing calls were made, not sure if I got it quite right. The contexts have the proper sidecar data in them already so no need to mess with potentialSidecars.

hedValidate is called once, after the context walk is complete.

  • copy over the hed-validator issue to bids-validator issue functions, modify to work with new issue objects.
  • Add new issues to bids-validator/src/issues/list.ts as needed.
  • Copy over detectHed function to skip calls when no HED data present.
  • Pin dependency to actual release of hed-validator.
  • Add tests

Currently am getting "ERROR: [HED_SCHEMA_LOAD_FAILED] The supplied schema specification is invalid. Specification: no sche..." when run on bids-examples/eeg_ds003645s_hed

Latest commits incorporate the hed validator api changes and follow the lead of https://github.com/bids-standard/bids-validator/pull/2014

Validation is done per file as a context level check instead of original accumulator/dataset level check.

…ion. Add hed validator dep built from current master.
@rwblair rwblair marked this pull request as draft March 27, 2023 22:10
@codecov
Copy link

codecov bot commented Mar 28, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.75%. Comparing base (ad6c900) to head (5beb84f).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1648   +/-   ##
=======================================
  Coverage   85.75%   85.75%           
=======================================
  Files          91       91           
  Lines        3785     3785           
  Branches     1218     1218           
=======================================
  Hits         3246     3246           
  Misses        453      453           
  Partials       86       86           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@VisLab
Copy link
Member

VisLab commented Aug 9, 2023

@rwblair --- should we be nervous that this is PR is still a draft? Is there anything that we need to prepare for? Thx! VisLab

@rwblair rwblair marked this pull request as ready for review September 28, 2023 20:30
@rwblair rwblair requested a review from nellh September 28, 2023 20:31
@VisLab
Copy link
Member

VisLab commented Oct 16, 2023

I know you have been swamped, but could you give us an update on this @nellh? Thx.

Copy link
Member

@nellh nellh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to merge. I ran through some test cases and was able to match the failures to the legacy validator. ds004395 runs out of memory with the default heap size but does pass increasing the size to 8GB. There could be some memory optimization to be done because this doesn't happen without the HED patches but that dataset is an exception, every other test case passed with the default heap size.

@VisLab
Copy link
Member

VisLab commented Jan 11, 2024 via email

@rwblair
Copy link
Member Author

rwblair commented Jan 12, 2024

Looks like the memory issue is coming from code I wrote, via the hedAccumulator in the bids-validator. Commenting out the call to the hed-validator and to the file accumulator allows ds004395 to finish validation. Commenting out just the call to the hed-validator but leaving the accumulator active causes it to run out of memory on default heap size.

I should finish profiling before running my mouth.

…l hed-validator call commented out for testing memory usage
@happy5214
Copy link
Collaborator

I'm not sure what's going on with the Deno build. The lines in the errors do not appear in the version of that file in our code base, and we don't use Buffer objects directly (though some dependencies might).

@happy5214
Copy link
Collaborator

@rwblair Has there been any further progress on this?

@happy5214
Copy link
Collaborator

I believe the failing test is because of an off-by-one error (you're converting the TSV data to the old format, which you shouldn't (BidsTsvFile's constructor now expects either a string or the new ColumnMap-style format), but you're forgetting to add the header as the first row).

Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks reasonable.

bids-validator/src/validators/bids.ts Outdated Show resolved Hide resolved
bids-validator/LICENSE Outdated Show resolved Hide resolved
bids-validator/src/validators/hed.ts Outdated Show resolved Hide resolved
bids-validator/src/validators/hed.ts Outdated Show resolved Hide resolved
const tsvData = await context.file.text()
const eventFile = new hedValidator.bids.BidsTsvFile(
context.path,
tsvData,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would passing the ColumnMap (or a copy) work here, or would that expose too much internal data? The parameter type in hed-validator is Map<string, string[]>, which appears to be ColumnMap's parent type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the columns map here generates an error:
TypeError: Cannot read properties of undefined (reading 'forEach')

This is coming from convertParsedTSVData:
https://github.com/hed-standard/hed-javascript/blob/3be833f929cd3db5e259b315ba6cff6e1c677f59/bids/tsvParser.js#L45
via:
https://github.com/hed-standard/hed-javascript/blob/3be833f929cd3db5e259b315ba6cff6e1c677f59/bids/types/tsv.js#L53

I think this is because Map objects pass the check meant to catch the old style tsv data:

> let x = new Map()
undefined
> x === Object(x)
true

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@happy5214 Does it make sense to go ahead as-is and then submit a PR once hed-validator can handle a ColumnMap directly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2041 should fix that issue. The type check was a bug that proved easy to fix.

@rwblair rwblair merged commit c06e88d into master Jul 31, 2024
33 checks passed
@rwblair rwblair deleted the schema/hedIntegration branch July 31, 2024 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants