Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit TSV loads to a manageable number of lines #138

Open
effigies opened this issue Jan 7, 2025 · 0 comments · May be fixed by #139
Open

Limit TSV loads to a manageable number of lines #138

effigies opened this issue Jan 7, 2025 · 0 comments · May be fixed by #139
Labels
enhancement New feature or request

Comments

@effigies
Copy link
Contributor

effigies commented Jan 7, 2025

Looking at a dataset:

$ wc -l sub-20950/beh/sub-20950_task-*.tsv
     457 sub-20950/beh/sub-20950_task-StopSignal_beh.tsv
  315803 sub-20950/beh/sub-20950_task-antisaccade_beh.tsv
     337 sub-20950/beh/sub-20950_task-antisaccade_events.tsv
     154 sub-20950/beh/sub-20950_task-axcpt_beh.tsv
     241 sub-20950/beh/sub-20950_task-flanker_beh.tsv
  316992 total

If **/*antisaccade_beh.tsv is .bidsignored, validation runs in under a second. If not, it takes several minutes at least (I canceled, so I don't know how long).

I think it's reasonable to type check only the first 1000 lines and assume that the rest are fine. We could make this a configurable parameter exposed to the API and CLI, and allow the current behavior to be accessed by setting the value to -1. 0 would load only the header, and a positive number would load that many rows beyond the header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant