fix trialization #130

jihyunbak · 2022-02-22T06:33:31Z

Description and related issues

This grew into a large PR, but major changes are the following:

Stimulus metadata are now integrated into the metadata/resources/list_of_stimuli.yaml file in this package. The <stim_name>.yaml files in the metadata repo are no longer used.
Trialization-related values in the stimulus metadata are fixed.
All tokenizers fixed.
StimValueExtractor is deleted; instead stimulus parameters are loaded directly in ToneTokenizer and TIMITTokenizer.
Introduced RecManager, which keeps the TDT reader. Previously we were constructing TDTReader twice, once for NeuralDataOriginator and once for StimulusOriginator/MarkManager ; this was time consuming. Now RecManager is initialized only once and is passed to the neural/stimulus originators.

This PR closes these issues:

Checklist:

Did not run the tests in the test folder, but this branch successfully generates nwb for wn2 (RVG06_B03), tone (RVG06_B05), timit (RVG06_B08) and dmr (RVG06_B09) blocks.

All tests pass on catscan: run pytest --basetemp=tmp -sv -n 8 tests on catscan from the root directory
If needed, docs have been update: docs/source has been updated for any added, moved, or removed files
Docs build with no errors: run make clean & make html from the docs folder
No python formatting errors: run flake8 nsds_lab_to_nwb tests from the root directory

- need a lower mark threshold (first mark is smaller) - force manual mark detection

(now always zero)

jthermiz · 2022-02-25T18:18:58Z

@jihyunbak

I tried to run this PR on RVG16_B01 (wn2) and I'm getting 182 trials = 60 stimuli + 120 baseline + 2 extra baseline in the beginning

It looks like there is a baseline period that starts right after the stimulus. I think we want to space out the baseline farther away from the stimulus offset to ensure it's a true baseline.

I have not yet inspect whether the marks are at the correct times.

I also tried running this PR on a RVG16_B06 (tone) and got a similar issue -- ~50% more trials than expected due to extra baseline trials.
Re: "Stimulus metadata are now integrated into the metadata/resources/list_of_stimuli.yaml file in this package. The <stim_name>.yaml files in the metadata repo are no longer used"

Is this just a one off change or are we moving away from the metadata repo completely and integrating everthing to the nwb repo?

jihyunbak · 2022-02-25T19:30:18Z

@jthermiz Thanks for reviewing!

Re 1, 2) splitted baseline trials, that's just because I worked in terms of between-marks intervals. So when the stimulus is located in the middle of the between-marks interval, the current code creates two baseline trials before/after the stimulus, such that the post-prev-stim baseline and the pre-next-stim baseline are actually connected. But if this is inconvenient, it is simple to fix the code to create only (1 stim trial + 1 baseline trial) per mark, plus 2 optional extra baseline trials in the beginning/end of recording. I will implement this.

Re 2) the baseline trial being too close to the stimulus:

It looks like there is a baseline period that starts right after the stimulus. I think we want to space out the baseline farther away from the stimulus offset to ensure it's a true baseline.

This is a really good point, I just don't have a good sense about this issue to make decisions on how far away from the stimulus the baseline should start. Let's discuss more in the Slack channel!

Re 3) Location of stimulus metadata: For now, I think I will keep the stimulus metadata file list_of_stimuli.yaml in this code repository and deprecate the old stimulus metadata in the metadata repo, for two reasons:

It makes sense to merge the information in the previous list_of_stimuli.yaml (with the audio/mark/parameter paths) and the information in the previous <stim_name>.yaml in the metadata repo.
We are using specialized tokenizers for each stimulus type anyways, so the stimulus metadata (unlike the probe metadata) are actually more tightly coupled with the codebase. Also, in the developing phase like now, it helps to see all changes together in the code repo. Perhaps we could decide to move list_of_stimuli.yaml back to the metadata repo later.

But the metadata repo still has the probe metadata (electrode coordinates etc.), which can stay away from the codebase.

jthermiz · 2022-03-01T23:34:22Z

Sounds good @jihyunbak.

Closing the loop on 2). For the discrete stimuli like white noise and tone, there appears to be consensus for this:

The baseline trial centered about this point:

baseline center = stim offset + (next stim onset - stim offset)/2
And that baseline duration = stimulus duration

@JesseLivezey also said:

timit would probably need different logic
For DMR we would probably need to have baseline times at the beginning and end

…avior

jihyunbak · 2022-03-03T02:18:27Z

Thanks @jthermiz! Follow-up:

Baseline fixes:

For tone and wn, now there are exactly (N+2) baseline trials with N stimulus trials. The two extra baselines are at the beginning/end of the recording. Also implemented @jthermiz 's suggested baseline structure (centered between two stimulus periods) for tone and wn stimuli. See list_of_stimuli.yaml for baseline_start and baseline_end values.
For timit and dmr, there should be N stimulus trials (N=998 for timit, N=1 for dmr) plus 2 baseline trials at the beginning/end of the recording. The last baseline period starts baseline_start = 3.0 s after the end of the last stimulus trial. This value of 3.0s is an arbitrary choice and is set by list_of_stimuli.yaml.

Additional changes:

Introduced RecManager, which keeps the TDT reader. Previously we were constructing TDTReader twice, once for NeuralDataOriginator and once for StimulusOriginator/MarkManager ; this was time consuming. Now RecManager is initialized only once and is passed to the neural/stimulus originators.

jihyunbak added 22 commits February 18, 2022 11:44

simplify tokenize method for continuous stim

26949b4

update .gitignore

802ef40

rename internal variable mark_onsets to mark_events

97eb3f5

add logging messages for debugging

4e8e6f1

pass audio_play_length with stimulus metadata

59f9fe2

update inputs to _tokenize method

a48a2e8

copy stimulus metadata into list_of_stimuli.yaml

cfb6c4b

get stimulus metadata from list_of_stimuli.yaml

1a74c8b

reorder stimuli by alphabetical order

fbf69b8

fix stim metadata for trialization

b87c361

update trial tokenization

5a20ec9

update logging

5da2527

fix mark event detection

a3ecc28

fix mark detection for wn2 stimulus

a4db2cb

- need a lower mark threshold (first mark is smaller) - force manual mark detection

remove old, ad hoc methods for wn tokenization

1418ea5

simplify _validate_num_stim_onsets

319d0fa

deprecate stim metadata mark_offset

3cdc363

(now always zero)

remove wn2 stim metadata stim_values

891c9d9

load stim parameter values in TIMIT and Tone tokenizers

6a10c03

pass mark time series directly to trials manager

818fe01

clarify logics for mark/stimulus starting time

d03f07b

minor fixes

946d09f

jihyunbak mentioned this pull request Feb 22, 2022

incorrect metadata values for stim trial structures #129

Closed

jihyunbak added 2 commits February 21, 2022 22:43

update dmr tokenization, reflecting the 60s back padding

c32bcd2

drop unused stim metadata mark_sampling_rate

1634bf7

jihyunbak mentioned this pull request Feb 22, 2022

Incorrect number of trials for RVG16_B01 #114

Closed

jihyunbak requested review from jthermiz and JesseLivezey February 22, 2022 07:01

jihyunbak added 8 commits March 2, 2022 15:53

bring back mark_offset, fix baseline definitions back to previous beh…

ef76e48

…avior

move mark event detection to MarkManager

910c737

change baseline periods for tone and wn

5772d59

add baseline_start value for timit and dmr tokenization

51604d8

introduce RecManager

66ff9c3

minor cleanup

7b99a7f

do add_stimulus to nwb in StimulusOriginator

3c7d83c

pass stimulus file paths with other stimulus metadata

36ced46

jihyunbak added 3 commits March 2, 2022 21:39

make stimulus utils module

a75bcc3

fix mark detection after interactive check

af96ce2

update docs

ec63761

jihyunbak merged commit 3e1a0f8 into main Mar 3, 2022

jihyunbak deleted the fix-trialization branch March 5, 2022 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix trialization #130

fix trialization #130

jihyunbak commented Feb 22, 2022 •

edited

Loading

jthermiz commented Feb 25, 2022 •

edited

Loading

jihyunbak commented Feb 25, 2022 •

edited

Loading

jthermiz commented Mar 1, 2022

jihyunbak commented Mar 3, 2022

fix trialization #130

fix trialization #130

Conversation

jihyunbak commented Feb 22, 2022 • edited Loading

Description and related issues

Checklist:

jthermiz commented Feb 25, 2022 • edited Loading

jihyunbak commented Feb 25, 2022 • edited Loading

jthermiz commented Mar 1, 2022

jihyunbak commented Mar 3, 2022

jihyunbak commented Feb 22, 2022 •

edited

Loading

jthermiz commented Feb 25, 2022 •

edited

Loading

jihyunbak commented Feb 25, 2022 •

edited

Loading