-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Treat directory formats like other data files #2064
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2064 +/- ##
==========================================
+ Coverage 85.69% 87.19% +1.49%
==========================================
Files 91 139 +48
Lines 3782 6730 +2948
Branches 1220 1587 +367
==========================================
+ Hits 3241 5868 +2627
- Misses 455 772 +317
- Partials 86 90 +4 ☔ View full report in Codecov by Sentry. |
6007618
to
6b61b58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I've been sitting on this, it got me thinking about twp things:
If directories should just get their own context, albeit it would probably look different from the current Context object. Outside of directories in objects.extensions I haven't come up with too many ideas on how this would be useful. (Assertions in the schema about empty directories?) Probably more trouble than its worth at the moment.
The other is if the call to replacePsuedoFiles shouldn't live in src/schema/walk.ts. walk would need access to the schema, we could piggy back this into dsContext which walk already receives. replacePseudoFiles is fundamentally a way of tricking walk into creating contexts for things it normally wouldn't. Nice because it reduces clutter in src/validators/bids.ts.
Happy to merge as is if that second point is problematic.
'tests/data/empty_files/sub-0001/meg/sub-0001_task-AEF_run-01_meg.ds/BadChannels', | ||
), | ||
undefined, | ||
'BadChannels should not be included in EMPTY_FILES error', | ||
'Contents of pseudo files should not be included in EMPTY_FILES error', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that directories and files that are children of psuedofile candidates should never show up in any error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the BadChannels and meg4 were only under discussion because the validator was given access to the directory contents, not the directory-as-file. Thus it makes sense to me to just back off from this micromanagement.
That said, I think it would be reasonable to add a reference to the original tree to the pseudofile, so that if we wanted to inspect contents to provide access to some metadata (e.g., OME-Zarr), it would be easy to do.
We definitely could update walk as: export async function* _walkFileTree(
fileTree: FileTree,
root: FileTree,
issues: DatasetIssues,
dsContext?: BIDSContextDataset,
): AsyncIterable<BIDSContext> {
for (const file of fileTree.files) {
yield new BIDSContext(root, file, issues, dsContext)
}
for (const dir of fileTree.directories) {
if (checkPseudofile(dir)) {
yield new BIDSContext(root, pseudoFile(dir), issues, dsContext)
} else {
yield* _walkFileTree(dir, root, issues, dsContext)
}
}
loadTSV.cache.delete(fileTree.path)
} Let me give that a try and see if it causes any interactions with the other PRs. |
I can do it with this diff:diff --git a/bids-validator/src/files/surgery.ts b/bids-validator/src/files/surgery.ts
deleted file mode 100644
index ffdeb573..00000000
--- a/bids-validator/src/files/surgery.ts
+++ /dev/null
@@ -1,50 +0,0 @@
-import { BIDSFile, FileTree } from '../types/filetree.ts'
-import { readEntities } from '../schema/entities.ts'
-
-function* walk(dir: FileTree): Generator<BIDSFile> {
- for (const file of dir.files) {
- yield file
- }
- for (const subdir of dir.directories) {
- yield* walk(subdir)
- }
-}
-
-const nullFile = {
- stream: new ReadableStream(),
- text: async () => '',
- readBytes: async (size: number, offset?: number) => new Uint8Array(),
-}
-
-function pseudoFile(dir: FileTree): BIDSFile {
- return {
- name: `${dir.name}/`,
- path: `${dir.path}/`,
- size: [...walk(dir)].reduce((acc, file) => acc + file.size, 0),
- ignored: dir.ignored,
- parent: dir.parent as FileTree,
- viewed: false,
- ...nullFile,
- }
-}
-
-function substitute(dir: FileTree, isFileLike: (file: FileTree) => boolean): void {
- const finalDirs = []
- for (const subdir of dir.directories) {
- if (isFileLike(subdir)) {
- dir.files.push(pseudoFile(subdir))
- } else {
- finalDirs.push(subdir)
- substitute(subdir, isFileLike)
- }
- }
- dir.directories = finalDirs
-}
-
-export function replacePseudoFiles(dir: FileTree, extensions: string[]): void {
- substitute(dir, (file) => {
- const { suffix, extension, entities } = readEntities(file.name)
- return (suffix !== '' && Object.keys(entities).length > 0 &&
- extensions.includes(`${extension}/`))
- })
-}
diff --git a/bids-validator/src/schema/context.ts b/bids-validator/src/schema/context.ts
index 8558f424..ee07b37b 100644
--- a/bids-validator/src/schema/context.ts
+++ b/bids-validator/src/schema/context.ts
@@ -6,7 +6,7 @@ import {
ContextNiftiHeader,
ContextSubject,
} from '../types/context.ts'
-import { GenericSchema } from '../types/schema.ts'
+import { Schema } from '../types/schema.ts'
import { BIDSFile, FileTree } from '../types/filetree.ts'
import { ColumnsMap } from '../types/columns.ts'
import { readEntities } from './entities.ts'
@@ -30,9 +30,12 @@ export class BIDSContextDataset implements ContextDataset {
issues: DatasetIssues
sidecarKeyValidated: Set<string>
options?: ValidatorOptions
+ schema: Schema
+ pseudofileExtensions: Set<string>
constructor(
options?: ValidatorOptions,
+ schema?: Schema,
tree?: FileTree,
description?: Record<string, unknown>,
ignored?: BIDSFile[],
@@ -40,6 +43,7 @@ export class BIDSContextDataset implements ContextDataset {
modalities?: string[],
issues?: DatasetIssues,
) {
+ this.schema = schema || {} as unknown as Schema
this.dataset_description = description || {}
this.tree = tree || new FileTree('/unknown', 'unknown')
this.ignored = ignored || []
@@ -50,6 +54,11 @@ export class BIDSContextDataset implements ContextDataset {
this.options = options
}
this.issues = issues || new DatasetIssues()
+ this.pseudofileExtensions = new Set<string>(
+ Object.values(this.schema?.objects?.extensions)
+ ?.map((ext) => ext.value)
+ ?.filter((ext) => ext.endsWith('/')),
+ )
}
get dataset_description(): Record<string, unknown> {
@@ -63,6 +72,15 @@ export class BIDSContextDataset implements ContextDataset {
: 'raw'
}
}
+
+ isPseudoFile(file: FileTree): boolean {
+ const { suffix, extension, entities } = readEntities(file.name)
+ return (
+ suffix !== '' &&
+ Object.keys(entities).length > 0 &&
+ this.pseudofileExtensions.has(`${extension}/`)
+ )
+ }
}
export class BIDSContextDatasetSubjects implements ContextDatasetSubjects {
@@ -82,7 +100,6 @@ export class BIDSContextDatasetSubjects implements ContextDatasetSubjects {
}
export class BIDSContext implements Context {
- schema?: GenericSchema
dataset: BIDSContextDataset
subject: ContextSubject
// path: string <- getter
@@ -116,7 +133,7 @@ export class BIDSContext implements Context {
this.suffix = bidsEntities.suffix
this.extension = bidsEntities.extension
this.entities = bidsEntities.entities
- this.dataset = dsContext ? dsContext : new BIDSContextDataset(undefined, fileTree)
+ this.dataset = dsContext ? dsContext : new BIDSContextDataset(undefined, undefined, fileTree)
this.subject = {} as ContextSubject
this.datatype = ''
this.modality = ''
@@ -127,6 +144,10 @@ export class BIDSContext implements Context {
this.associations = {} as ContextAssociations
}
+ get schema(): Schema {
+ return this.dataset.schema
+ }
+
get size(): number {
return this.file.size
}
diff --git a/bids-validator/src/schema/walk.ts b/bids-validator/src/schema/walk.ts
index caebf99d..5bfa4d3f 100644
--- a/bids-validator/src/schema/walk.ts
+++ b/bids-validator/src/schema/walk.ts
@@ -1,8 +1,35 @@
import { BIDSContext, BIDSContextDataset } from './context.ts'
-import { FileTree } from '../types/filetree.ts'
+import { BIDSFile, FileTree } from '../types/filetree.ts'
import { DatasetIssues } from '../issues/datasetIssues.ts'
import { loadTSV } from '../files/tsv.ts'
+function* quickWalk(dir: FileTree): Generator<BIDSFile> {
+ for (const file of dir.files) {
+ yield file
+ }
+ for (const subdir of dir.directories) {
+ yield* quickWalk(subdir)
+ }
+}
+
+const nullFile = {
+ stream: new ReadableStream(),
+ text: async () => '',
+ readBytes: async (size: number, offset?: number) => new Uint8Array(),
+}
+
+function pseudoFile(dir: FileTree): BIDSFile {
+ return {
+ name: `${dir.name}/`,
+ path: `${dir.path}/`,
+ size: [...quickWalk(dir)].reduce((acc, file) => acc + file.size, 0),
+ ignored: dir.ignored,
+ parent: dir.parent as FileTree,
+ viewed: false,
+ ...nullFile,
+ }
+}
+
/** Recursive algorithm for visiting each file in the dataset, creating a context */
export async function* _walkFileTree(
fileTree: FileTree,
@@ -12,7 +39,11 @@ export async function* _walkFileTree(
yield new BIDSContext(file, dsContext)
}
for (const dir of fileTree.directories) {
- yield* _walkFileTree(dir, dsContext)
+ if (dsContext.isPseudoFile(dir)) {
+ yield new BIDSContext(pseudoFile(dir), dsContext)
+ } else {
+ yield* _walkFileTree(dir, dsContext)
+ }
}
loadTSV.cache.delete(fileTree.path)
}
diff --git a/bids-validator/src/types/context.ts b/bids-validator/src/types/context.ts
index 0db8a792..6c4d7991 100644
--- a/bids-validator/src/types/context.ts
+++ b/bids-validator/src/types/context.ts
@@ -1,4 +1,4 @@
-import { GenericSchema } from './schema.ts'
+import { Schema } from './schema.ts'
import { ValidatorOptions } from '../setup/options.ts'
import { FileTree } from '../types/filetree.ts'
@@ -95,7 +95,7 @@ export interface ContextNiftiHeader {
sform_code: number
}
export interface Context {
- schema?: GenericSchema
+ schema: Schema
dataset: ContextDataset
subject: ContextSubject
path: string
diff --git a/bids-validator/src/types/schema.ts b/bids-validator/src/types/schema.ts
index b8a4f86f..9cc9414f 100644
--- a/bids-validator/src/types/schema.ts
+++ b/bids-validator/src/types/schema.ts
@@ -12,10 +12,18 @@ export interface Entity {
format: string
}
+export interface Value {
+ value: string
+}
+
export interface SchemaObjects {
+ datatypes: Record<string, Value>
+ enums: Record<string, Value>
+ entities: Record<string, Entity>
+ extensions: Record<string, Value>
files: Record<string, unknown>
formats: Record<string, Format>
- entities: Record<string, Entity>
+ suffixes: Record<string, Value>
}
export interface SchemaRules {
diff --git a/bids-validator/src/validators/bids.ts b/bids-validator/src/validators/bids.ts
index b39b6bd5..306777c5 100644
--- a/bids-validator/src/validators/bids.ts
+++ b/bids-validator/src/validators/bids.ts
@@ -17,7 +17,6 @@ import { sidecarWithoutDatafile, unusedStimulus } from './internal/unusedFile.ts
import { BIDSContext, BIDSContextDataset } from '../schema/context.ts'
import { parseOptions } from '../setup/options.ts'
import { hedValidate } from './hed.ts'
-import { replacePseudoFiles } from '../files/surgery.ts'
/**
* Ordering of checks to apply
@@ -46,13 +45,6 @@ export async function validate(
const schema = await loadSchema(options.schema)
summary.schemaVersion = schema.schema_version
- // @ts-expect-error
- const dirExtensions = Object.values(schema.objects.extensions)
- // @ts-expect-error
- .map((ext) => ext.value)
- .filter((ext) => ext.endsWith('/'))
- replacePseudoFiles(fileTree, dirExtensions)
-
/* There should be a dataset_description in root, this will tell us if we
* are dealing with a derivative dataset
*/
@@ -60,7 +52,7 @@ export async function validate(
(file: BIDSFile) => file.name === 'dataset_description.json',
)
- const dsContext = new BIDSContextDataset(options, fileTree)
+ const dsContext = new BIDSContextDataset(options, schema, fileTree)
if (ddFile) {
dsContext.dataset_description = await loadJSON(ddFile).catch((error) => {
dsContext.issues.addNonSchemaIssue(error.key, [ddFile]) At this point it would be easiest to do it on top of #2066. LMK if you want me to go ahead. |
And if we add the |
6b61b58
to
2448b61
Compare
c553477
to
462cfb1
Compare
1dfab36
to
e5b0aba
Compare
This PR does surgery on the
fileTree
to find data "files" that are actually directories, usingschema.objects.extensions
to identify them. A data file must have entities and a suffix, so this is finely targeted.From experience, getting the detection wrong results in a huge number of false positives as
anat/
doesn't match any file rules. We can also now drop the.zarr
and.ds
ignore rules and depend on the schema.Follow-up to #2063.