Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: Document the validation model, context and inheritance principle #94

Merged
merged 3 commits into from
Nov 11, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ deno run -A jsr:@bids/validator
```

```{toctree}
:maxdepth: 2
:hidden:
:caption: User guide

Expand All @@ -35,6 +36,7 @@ user_guide/issues.md
```

```{toctree}
:maxdepth: 2
:hidden:
:caption: Developer guide

Expand All @@ -43,6 +45,14 @@ dev/contributing.md
dev/environment.md
```

```{toctree}
:maxdepth: 2
:hidden:
:caption: Concepts

validation-model/index.md
```

```{toctree}
:hidden:
:caption: Reference
Expand Down
159 changes: 159 additions & 0 deletions docs/validation-model/context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# The validation context

The core structure of the validator is the `context`,
a namespace that aggregates properties of the dataset (the `dataset` variable, above)
and the current file being validated.

Its type can be described as follows:

```typescript
Context: {
// Dataset properties
dataset: {
dataset_description: object
datatypes: string[]
modalities: string[]
// Lists of subjects as discovered in different locations
subjects: {
sub_dirs: string[]
participant_id: string[]
effigies marked this conversation as resolved.
Show resolved Hide resolved
phenotype: string[]
}
}

// Properties of the current subject
subject: {
// Lists of sessions as discovered in different locations
sessions: {
ses_dirs: string[]
session_id: string[]
phenotype: string[]
}
}

// Path properties
path: string
entities: object
datatype: string
suffix: string
extension: string
// Inferred property
modality: string

// Inheritance principle constructions
sidecar: object
associations: {
// Paths and properties of files associated with the current file
aslcontext: { path: string, n_rows: integer, volume_type: string[] }
...
}

// Content properties
size: integer

// File type-specific content properties
columns: object
gzip: object
json: object
nifti_header: object
ome: object
tiff: object
}
```

To take an example, in a minimal dataset containing only a single subject's T1-weighted image,
the context for that image might be:
effigies marked this conversation as resolved.
Show resolved Hide resolved

```yaml
dataset:
dataset_description:
Name: "Example dataset"
BIDSVersion: "1.10.0"
DatasetType: "raw"
datatypes: ["anat"]
modalities: ["mri"]
subjects:
sub_dirs: ["sub-01"]
participant_id: null
phenotype: null

subject:
sessions: { ses_dirs: null, session_id: null, phenotype: null }

path: "/sub-01/anat/sub-01_T1w.nii.gz"
entities:
subject: "01"
datatype: "anat"
suffix: "T1w"
extension: ".nii.gz"
modality: "mri"

sidecar:
MagneticFieldStrength: 3
...
associations: {}

size: 22017017
nifti_header:
dim: 3
voxel_sizes: [1, 1, 1]
...
```

Fields from this context can be queried using object dot notation.
For example, `sidecar.MagneticFieldStrengh` has the integer value `3`,
and `entities.subject` has the string value `"01"`.
This permits the use of boolean expressions, such as
`sidecar.RepetitionTime == nifti_header.pixdim[4]`.

As the validator validates each file in turn, it constructs a new context.
The `dataset` property remains constant,
while a new `subject` property is constructed when inspecting a new subject directory,
and the remaining properties are constructed for each file, individually.

## Context definition

The validation context is largely dictated by the [schema],
effigies marked this conversation as resolved.
Show resolved Hide resolved
and the full type generated from the schema definition can be found in
[jsr:@bids/schema/context](https://jsr.io/@bids/schema/doc/context/~/Context).

## Context construction

The construction of a validation context is where BIDS concepts are implemented.
Again, this is easiest to explain with pseudocode:

```python
def buildFileContext(dataset, file):
context = namespace()
context.dataset = dataset
context.path = file.path
context.size = file.size

fileParts = parsePath(file.path)
context.entities = fileParts.entities
context.datatype = fileParts.datatype
context.suffix = fileParts.suffix
context.extension = fileParts.extension

context.subject = buildSubjectContext(dataset, context.entities.subject)

context.sidecar = loadSidecar(file)
context.associations = namespace({
association: loadAssociation(file, association)
for association in associationTypes(file)
})

if isTSV(file):
context.columns = loadColumns(file)
if isNIfTI(file):
context.nifti_header = loadNiftiHeader(file)
... # And so on

return context
```

The heavy lifting is done in `parsePath`, `loadSidecar` and `loadAssociation`.
`parsePath` is relatively simple, but `loadSidecar` and `loadAssociation`
implement the BIDS [Inheritance Principle].

[Inheritance Principle]: https://bids-specification.readthedocs.io/en/stable/common-principles.html#the-inheritance-principle
31 changes: 31 additions & 0 deletions docs/validation-model/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Validation model

The basic process of the BIDS validator operates according to the following
[Python]-like pseudocode:

```python
def validate(directory):
fileTree = loadFileTree(directory)
dataset = buildDatasetContext(fileTree)

for file in walk(dataset.fileTree):
context = buildFileContext(dataset, file)
for check in perFileChecks:
check(context)
effigies marked this conversation as resolved.
Show resolved Hide resolved

for check in datasetChecks:
check(dataset)
```

The following sections will describe the [the validation context](context.md)
and our implementation of [the Inheritance Principle](inheritance-principle.md).

```{toctree}
:maxdepth: 1
:hidden:

context.md
inheritance-principle.md
```

[Python]: https://en.wikipedia.org/wiki/Python_(programming_language)
63 changes: 63 additions & 0 deletions docs/validation-model/inheritance-principle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# The Inheritance Principle

The [Inheritance Principle] is a core concept in BIDS.
Its original definition (edited for brevity) was:

> Any metadata file (`.json`, `.bvec`, `.tsv`, etc.) may be defined at any directory level,
> but no more than one applicable file may be defined at a given level.
> The values from the top level are inherited by all lower levels
> unless they are overridden by a file at the lower level. [...]
> There is no notion of "unsetting" a key/value pair.

Here, "top level" means dataset root, and "lower level" means closer to the data file
the metadata applies to.
More recent versions of the specification have made the language more precise at the cost
of verbosity.
The core concept remains the same.

The validator uses a "walk back" algorithm to find inherited files:

```python
def walkBack(file, extension):
fileParts = parsePath(file.path)

fileTree = file.parent
while fileTree:
for child in fileTree.children:
parts = parsePath(child.path)
if (
parts.extension == extension
and parts.suffix = fileParts.suffix
and isSubset(parts.entities, fileParts.entities)
):
yield child

fileTree = fileTree.parent
```

Using this basis, `loadSidecar` is simply:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where in validator we would apply inheritance's walkBack to .tsv files? e.g. if there is top level sessions.tsv and then sub-01/sub-01_sessions.tsv ?

Copy link
Contributor Author

@effigies effigies Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BIDS does not have this concept yet, so the validator does not have it. We would need to define it to know whether walkBack is even applicable (augment sub-01_sessions.tsv with sessions.tsv), or if we need to construct one global table (augment sessions.tsv with sub-*_sessions.tsv).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what .tsv files we have that principle applied so far ?
We do list not only .json but also .bvec and .tsv files in https://github.com/bids-standard/bids-specification/blob/master/src/common-principles.md#the-inheritance-principle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those are the associated files, see a couple lines down from here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I do not see any kind of "inheritance" operation there though like here | sidecar -- there it just returns first found instead of "building up" the value.

Copy link
Contributor Author

@effigies effigies Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://bids-specification.readthedocs.io/en/stable/common-principles.html#rules:

For tabular files and other simple metadata files (for instance, bvec / bval files for diffusion MRI), accessing metadata associated with a data file MUST consider only the applicable file that is lowest in the filesystem hierarchy.


```python
def loadSidecar(file):
sidecar = {}
for json in walkBack(file, '.json'):
# Order matters. `|` overrides the left side with the right.
# Any collisions resolve in favor of closer to the data file.
sidecar = loadJson(json) | sidecar
return sidecar
```

For `loadAssociation`, only the first match is used, if found:

```python
def loadAssociation(file, association):
for associated_file in walkBack(file, getExtension(association)):
return getLoader(association)(associated_file)
```

Each association contains different metadata to extract.
Note that some associations have a different suffix from the files they associate to.
The actual implementation of `walkBack` allows overriding suffixes as well as extensions,
but it would not be instructive to show here.

[Inheritance Principle]: https://bids-specification.readthedocs.io/en/stable/common-principles.html#the-inheritance-principle