-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[builder] CxG schema 5 / Census schema 2 #1024
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1024 +/- ##
=======================================
Coverage 81.32% 81.33%
=======================================
Files 73 73
Lines 5553 5566 +13
=======================================
+ Hits 4516 4527 +11
- Misses 1037 1039 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@pablo-gar - thanks for the assay term changes. Question/open issue: the Census schema also defines special handling for the normalized layer, for any "Smart-Seq" assays. Can we add the definitive list of which EFO terms are considered a Smart-Seq assay? Ideally this would be included in the Census schema (or referenced as you did with the assay filter terms). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
docs/cellxgene_census_schema.md
Outdated
@@ -339,15 +223,15 @@ An example of this `SOMADataFrame` is shown below: | |||
<tbody> | |||
<tr> | |||
<td>census_schema_version</td> | |||
<td>1.3.0</td> | |||
<td>2.0.0</td> | |||
</tr> | |||
<tr> | |||
<td>census_build_date</td> | |||
<td>2022-11-30</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nitpick, but it could be a good idea to move this date forward to make the example more realistic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@bkmartinjr - In regards to issue #993 there is this point:
Once a test build is produced, are we to check this manually in the python interpreter by doing a count of Or should this be captured in some type of of post build acceptance test where some sanity checks about the data are done? |
This is entirely an upstream issue in the DP process, and the builder does not enforce (or check) for this level of metadata compliance with the CxG schema. These checks for compliance with the CxG schema are the provenance of the schema validation toolkit used by Lattice, et al. We could do this kind of checks, but it is redundant, and adds linkages across layers in the system that don't add much value IMHO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at the schema updates and the updated list of accepted assays. Looks good!
@atarashansky - we have included your requested organisms info (#796) in Census schema 2.0.0. Current content will be:
Please feel free to leave comments on both the Schema MD file changes and the code. |
Light QC on test build shows no issues Checks
See notebook |
Builder support for Census schema 2.0.0 / CELLxGENE schema 5.0.0
Fixes: #993
Fixes: #1022
Fixes: #796
Primary changes: