Fix intan `one-file-per-channel` #1601

h-mayorquin · 2024-11-28T19:21:37Z

The fix is composed of two steps:

Use "stream_name" as a reference to the channel maps instead of the stream_index
Sort the file names of in the one-file-per-channel case based on their channel names.

The first point also has the advantage that it removes the need to keep yet another numeric reference (that is not stream_id or stream_index) for the different data structures.

zm711 · 2024-12-02T13:47:23Z

Love the use of the dicts instead of the lists. I'm busy for the next couple days, but since this ~200 lines of change I'll try to carefully read this at some point this week so we can add this in. Unless Sam has the bandwidth before me :)

h-mayorquin · 2024-12-02T16:46:09Z

Thanks, @zm711

zm711

My only question is how do I know which stream index to use (ie if I as a previous user < 14.0 if this PR makes it....)? Like if I were stream_index 15 in my code previously would this be a breaking change? I can test if needed.

Otherwise I think this looks really great. A lot better--definitely more robust.

neo/rawio/intanrawio.py

zm711 · 2024-12-03T19:22:07Z

neo/rawio/intanrawio.py


    # we need time to be the last value
-    raw_file_paths_dict[15] = [Path(dirname / "time.dat")]
+    raw_file_paths_dict["timestamp"] = [Path(dirname / "time.dat")]


comment can be deleted above here since last value means nothing now :)

Good catch.

neo/rawio/intanrawio.py

h-mayorquin · 2024-12-04T17:48:24Z

Thanks @zm711 . I implemented your feedback.

My only question is how do I know which stream index to use (ie if I as a previous user < 14.0 if this PR makes it....)? Like if I were stream_index 15 in my code previously would this be a breaking change? I can test if needed.

Good point. My impression is that there is no contract in neo that stream indices have a fixed meaning. Ids and Names should be fixed but the stream indices depend on the specific instance of the format. Some context:

Code like this in spikeinterface and here in neo where the index is derived from the name:
https://github.com/SpikeInterface/spikeinterface/blob/853d8a48abcf40f5a47401ed2f7baa83e1c3295f/src/spikeinterface/extractors/neoextractors/neobaseextractor.py#L224-L225
If the indices are meant to be fixed, why do we need ids?
For Plexon, some streams are unavailable in some cases. At least in that case, stream_index does not have a fixed meaning.

All that said, it is unfortunate that the raw API for neo exposes the stream_index as the primary access method. My impression is that the normal io of neo and spikeinterface were the means to access the raw API and that the raw.io API was not designed to be relied on directly. My opinion is that either exposing stream_index directly or coupling it to the signal_streams structure was a mistake that was done because segment index and block index have natural order semantics. If this had been done with stream_id instead the following code (often used) would be unnecessary:

    def _get_analogsignal_chunk(
        self,
        block_index: int,
        seg_index: int,
        i_start: int | None,
        i_stop: int | None,
        stream_index: int,
        channel_indexes: list[int] | None,
    ):

        stream_id = self.header["signal_streams"][stream_index]["id"]
        buffer_id = self.header["signal_streams"][stream_index]["buffer_id"]

    def channel_id_to_index(self, stream_index: int, channel_ids: list[str]):
        """
        Inside a stream, transform channel_ids to channel_indexes.
        Based on self.header['signal_channels']
        channel_indexes are zero-based offsets within the stream

        Parameters
        ----------
        stream_index: int
            the stream index in which to convert the channel_ids to channel_indexes
        channel_ids: list[str]
            the list of channel_ids to convert to channel_indexes

        Returns
        -------
        channel_indexes: np.array[int]
             the channel_indexes associated with the given channel_ids
        """
        # unique ids is already checked in _check_stream_signal_channel_characteristics
        stream_id = self.header["signal_streams"][stream_index]["id"]
        mask = self.header["signal_channels"]["stream_id"] == stream_id

Yes, maybe something to discuss with the team. The thorny issue is that code like this is commonly use to get the stream_index:

    def test_sync(self):
        rawio_with_sync = SpikeGLXRawIO(self.get_local_path("spikeglx/NP2_with_sync"), load_sync_channel=True)
        rawio_with_sync.parse_header()
        stream_index = list(rawio_with_sync.header["signal_streams"]["name"]).index("imec0.ap")

It is hard to square that use with a variable number of streams (as in Plexon).

zm711 · 2024-12-05T13:12:28Z

Thanks @zm711 . I implemented your feedback.

Thanks yeah. I agree the design is a little tricky since the rawio exposes stream_index at the public level. I guess no particular stream_index is guaranteed, but in the case of Intan we have used the stream_index has a fossilized thing for so long (and I would say spikeglx, openephy, neuralynx, intan, and plexon are the most common ios we have used) that even though we don't have an official contract it might be disruptive to make a public api level change just in this case. Are you okay if we run this by Sam or the whole team before merging?

I guess to summarize my problem is that we don't actually have enough constraints on how to build a rawio and so different ios were build differently. I think this is a great improvement overall and will reduce mistakes. Could you run the intan test suite from spikeinterface on this branch and double check that it converts everything appropriately to make sure at the SI level we didn't make any assumptions :)

h-mayorquin · 2024-12-05T14:34:35Z

so long (and I would say spikeglx, openephy, neuralynx, intan, and plexon are the most common ios we have used) that even though we don't have an official contract it might be disruptive to make a public api level change just in this case. Are you okay if we run this by Sam or the whole team before merging?

Sure, but we can also just run the index on master right now and re-order the signal stream header so it matches what we had before. Could you check that? I can check the Spikeinterface but I think that over there we don't rely on stream index.

h-mayorquin · 2024-12-05T14:39:10Z

In fact, let's do just the re-ordering and then we don't need to check anything else as that would provide backward compatibility with what you had before. I will take care of it.

h-mayorquin · 2024-12-05T15:24:17Z

Oh, in fact we are already sorting so this does should keep that behavior:

python-neo/neo/rawio/intanrawio.py

Lines 246 to 265 in 2576037

    
           # we need to sort the data because the string of stream_index 10 is mis-sorted. 
        
           buffer_ids = [] 
        
           stream_ids_sorted = sorted([int(stream_id) for stream_id in stream_ids]) 
        
           signal_streams["id"] = [str(stream_id) for stream_id in stream_ids_sorted] 
        
           for stream_index, stream_id in enumerate(stream_ids_sorted): 
        
               if self.filename.suffix == ".rhd": 
        
                   name = stream_id_to_stream_name_rhd.get(int(stream_id), "") 
        
               else: 
        
                   name = stream_id_to_stream_name_rhs.get(int(stream_id), "") 
        
               signal_streams["name"][stream_index] = name 
        
               # zach I need you help here 
        
               if self.file_format == "header-attached": 
        
                   buffer_id = "" 
        
               elif self.file_format == "one-file-per-signal": 
        
                   buffer_id = stream_id 
        
                   buffer_ids.append(buffer_id) 
        
               elif self.file_format == "one-file-per-channel": 
        
                   buffer_id = "" 
        
               signal_streams["buffer_id"][stream_index] = buffer_id

I also double-checked the order of the streams on the signals header (which is what determines stream_index) and found no change between master and this PR. Here is the code and output:

from pathlib import Path
import subprocess
from neo.rawio import IntanRawIO
from pprint import pprint
import time 

neo_test_dir = Path("/home/heberto/ephy_testing_data/intan")
intan_test_cases_folder_paths = [p for p in neo_test_dir.iterdir() if p.is_dir()]

for test_case_folder_path in intan_test_cases_folder_paths:
    info_rhd_path = test_case_folder_path / "info.rhd"
    info_rhs_path = test_case_folder_path / "info.rhs"
    
    if info_rhd_path.exists() or info_rhs_path.exists():
        filename = info_rhd_path if info_rhd_path.exists() else info_rhs_path
        print("--------------------")
        print(test_case_folder_path.name)
        print(filename.suffix)
        # Switch to the master branch
        
        subprocess.run(["git", "switch", "master"], check=True)
        time.sleep(1) # Just so the print works
        reader = IntanRawIO(filename=filename)
        reader.parse_header()
        pprint(reader.header["signal_streams"])
        
        subprocess.run(["git", "switch", "fix_intan_one_file_per_channel"], check=True)
        time.sleep(1) # Just so the print works

        reader = IntanRawIO(filename=filename)
        reader.parse_header()
        pprint(reader.header["signal_streams"])

--------------------
rhd_fpc_multistim_240514_082044
.rhd
Your branch is up to date with 'origin/master'.
Already on 'master'
array([('RHD2000 amplifier channel', '0', ''),
       ('RHD2000 auxiliary input channel', '1', ''),
       ('USB board ADC input channel', '3', ''),
       ('USB board digital input channel', '4', ''),
       ('USB board digital output channel', '5', '')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
Your branch is up to date with 'origin/fix_intan_one_file_per_channel'.
Switched to branch 'fix_intan_one_file_per_channel'
array([('RHD2000 amplifier channel', '0', ''),
       ('RHD2000 auxiliary input channel', '1', ''),
       ('USB board ADC input channel', '3', ''),
       ('USB board digital input channel', '4', ''),
       ('USB board digital output channel', '5', '')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
--------------------
intan_fps_test_231117_052500
.rhd
Your branch is up to date with 'origin/master'.
Switched to branch 'master'
array([('RHD2000 amplifier channel', '0', '0'),
       ('RHD2000 auxiliary input channel', '1', '1'),
       ('USB board ADC input channel', '3', '3'),
       ('USB board digital input channel', '4', '4')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
Your branch is up to date with 'origin/fix_intan_one_file_per_channel'.
Switched to branch 'fix_intan_one_file_per_channel'
array([('RHD2000 amplifier channel', '0', '0'),
       ('RHD2000 auxiliary input channel', '1', '1'),
       ('USB board ADC input channel', '3', '3'),
       ('USB board digital input channel', '4', '4')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
--------------------
intan_fpc_test_231117_052630
.rhd
Your branch is up to date with 'origin/master'.
Switched to branch 'master'
array([('RHD2000 amplifier channel', '0', ''),
       ('RHD2000 auxiliary input channel', '1', ''),
       ('USB board ADC input channel', '3', ''),
       ('USB board digital input channel', '4', '')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
Your branch is up to date with 'origin/fix_intan_one_file_per_channel'.
Switched to branch 'fix_intan_one_file_per_channel'
array([('RHD2000 amplifier channel', '0', ''),
       ('RHD2000 auxiliary input channel', '1', ''),
       ('USB board ADC input channel', '3', ''),
       ('USB board digital input channel', '4', '')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
--------------------
intan_fps_rhs_test_240329_091536
.rhs
Your branch is up to date with 'origin/master'.
Switched to branch 'master'
array([('RHS2000 amplifier channel', '0', '0'),
       ('USB board ADC input channel', '3', '3'),
       ('USB board ADC output channel', '4', '4'),
       ('USB board digital input channel', '5', '5'),
       ('USB board digital output channel', '6', '6'),
       ('DC Amplifier channel', '10', '10'), ('Stim channel', '11', '11')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
Your branch is up to date with 'origin/fix_intan_one_file_per_channel'.
Switched to branch 'fix_intan_one_file_per_channel'
array([('RHS2000 amplifier channel', '0', '0'),
       ('USB board ADC input channel', '3', '3'),
       ('USB board ADC output channel', '4', '4'),
       ('USB board digital input channel', '5', '5'),
       ('USB board digital output channel', '6', '6'),
       ('DC Amplifier channel', '10', '10'), ('Stim channel', '11', '11')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
--------------------
intan_fpc_rhs_test_240329_091637
.rhs
Your branch is up to date with 'origin/master'.
Switched to branch 'master'
/home/heberto/development/python-neo/neo/rawio/intanrawio.py:826: UserWarning: Stim not implemented for `one-file-per-channel` due to lack of test files
  warnings.warn("Stim not implemented for `one-file-per-channel` due to lack of test files")
Switched to branch 'fix_intan_one_file_per_channel'
array([('RHS2000 amplifier channel', '0', ''),
       ('USB board ADC input channel', '3', ''),
       ('USB board ADC output channel', '4', ''),
       ('USB board digital input channel', '5', ''),
       ('USB board digital output channel', '6', ''),
       ('DC Amplifier channel', '10', '')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
Your branch is up to date with 'origin/fix_intan_one_file_per_channel'.
array([('RHS2000 amplifier channel', '0', ''),
       ('USB board ADC input channel', '3', ''),
       ('USB board ADC output channel', '4', ''),
       ('USB board digital input channel', '5', ''),
       ('USB board digital output channel', '6', ''),
       ('DC Amplifier channel', '10', '')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
/home/heberto/development/python-neo/neo/rawio/intanrawio.py:826: UserWarning: Stim not implemented for `one-file-per-channel` due to lack of test files
  warnings.warn("Stim not implemented for `one-file-per-channel` due to lack of test files")

zm711

I'm good with this then. I'll ping Sam to look at our discussion, but as long as this maintains behavior then the discussion is more for future development than this PR. Thanks Heberto for putting up with me :)

h-mayorquin · 2024-12-05T16:37:45Z

I branched the index discussion as a separate issue because it seems like an important thing:

#1604

zm711

One last comment then ready to merge @h-mayorquin .

zm711 · 2024-12-13T15:24:33Z

neo/test/rawiotest/test_intanrawio.py

+        channel_names = [p.name[4:-4] for p in amplifier_file_paths]
+
+        for channel_name, amplifier_file_path in zip(channel_names, amplifier_file_paths):
+            data_raw = np.fromfile(amplifier_file_path, dtype=np.int16).squeeze()
+            data_from_neo = intan_reader.get_analogsignal_chunk(channel_ids=[channel_name], stream_index=0).squeeze()
+            np.testing.assert_allclose(data_raw, data_from_neo)


Could we get a note in comment about why this works...

so p.name[4:-4] means nothing to someone who doesn't know the file format. So I would explain why 4:-4. And if you're feeling ambitious maybe a better name than p (since I know a certain developer who hates one letter variable names....).

Good idea. I was lazy here.

zm711 · 2024-12-13T18:28:02Z

Sorry I merged main into the branch, but this bug that is fixed on main is now popping up on RTD here. It appeared on the PR we just merged. Sorry let me try to track it down.

h-mayorquin · 2024-12-13T18:56:21Z

git log revert?

zm711 · 2024-12-13T18:59:49Z

It broke your other PR too. So I would love to forget it out. You want to look at the log and see if you see anything ? To me it looks like the code should be fine, so I don't know why it is using an old version of the plexon code.... I've been googling to see if there is a cache I can empty but seems like RTD doesn't make it easy.

zm711 · 2024-12-13T19:12:10Z

Okay I found another Plexon overflow that looked the same as the last one so I didn't even realize it was a different one which had messed with me. Once we test and merge #1612 we can merge main into your PRs and that should fix everything :)

h-mayorquin · 2024-12-13T22:59:38Z

I guessed that Pleon will bite you, one way or the other.

samuelgarcia · 2025-01-06T15:24:48Z

Hi Ramon
This is a big clean. thank you.
OK for dict approach.

h-mayorquin added 3 commits November 26, 2024 14:07

add failing test

2d62f2a

fix

b1d7a8d

eliminate unecessary loop

28fdcec

zm711 reviewed Dec 3, 2024

View reviewed changes

zach feedback

31b92c4

h-mayorquin mentioned this pull request Dec 5, 2024

[Discussion] Consistency of stream_index in neo.rawio #1604

Open

zm711 approved these changes Dec 5, 2024

View reviewed changes

Merge branch 'master' into fix_intan_one_file_per_channel

ce69f51

zm711 added this to the 0.14.0 milestone Dec 6, 2024

zm711 reviewed Dec 13, 2024

View reviewed changes

zm711 self-assigned this Dec 13, 2024

improve tests

c158e55

zm711 approved these changes Dec 13, 2024

View reviewed changes

Merge branch 'master' into fix_intan_one_file_per_channel

ba7e716

Merge branch 'master' into fix_intan_one_file_per_channel

a64ffbf

zm711 merged commit 45ec350 into NeuralEnsemble:master Dec 14, 2024
3 checks passed

h-mayorquin deleted the fix_intan_one_file_per_channel branch January 7, 2025 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix intan `one-file-per-channel` #1601

Fix intan `one-file-per-channel` #1601

h-mayorquin commented Nov 28, 2024 •

edited

Loading

zm711 commented Dec 2, 2024

h-mayorquin commented Dec 2, 2024

zm711 left a comment

zm711 Dec 3, 2024

h-mayorquin Dec 4, 2024

h-mayorquin commented Dec 4, 2024 •

edited

Loading

zm711 commented Dec 5, 2024

h-mayorquin commented Dec 5, 2024

h-mayorquin commented Dec 5, 2024 •

edited

Loading

h-mayorquin commented Dec 5, 2024 •

edited

Loading

zm711 left a comment

h-mayorquin commented Dec 5, 2024

zm711 left a comment

zm711 Dec 13, 2024

h-mayorquin Dec 13, 2024

h-mayorquin Dec 13, 2024

zm711 commented Dec 13, 2024

h-mayorquin commented Dec 13, 2024

zm711 commented Dec 13, 2024

zm711 commented Dec 13, 2024

h-mayorquin commented Dec 13, 2024

samuelgarcia commented Jan 6, 2025

Fix intan one-file-per-channel #1601

Fix intan one-file-per-channel #1601

Conversation

h-mayorquin commented Nov 28, 2024 • edited Loading

zm711 commented Dec 2, 2024

h-mayorquin commented Dec 2, 2024

zm711 left a comment

Choose a reason for hiding this comment

zm711 Dec 3, 2024

Choose a reason for hiding this comment

h-mayorquin Dec 4, 2024

Choose a reason for hiding this comment

h-mayorquin commented Dec 4, 2024 • edited Loading

zm711 commented Dec 5, 2024

h-mayorquin commented Dec 5, 2024

h-mayorquin commented Dec 5, 2024 • edited Loading

h-mayorquin commented Dec 5, 2024 • edited Loading

zm711 left a comment

Choose a reason for hiding this comment

h-mayorquin commented Dec 5, 2024

zm711 left a comment

Choose a reason for hiding this comment

zm711 Dec 13, 2024

Choose a reason for hiding this comment

h-mayorquin Dec 13, 2024

Choose a reason for hiding this comment

h-mayorquin Dec 13, 2024

Choose a reason for hiding this comment

zm711 commented Dec 13, 2024

h-mayorquin commented Dec 13, 2024

zm711 commented Dec 13, 2024

zm711 commented Dec 13, 2024

h-mayorquin commented Dec 13, 2024

samuelgarcia commented Jan 6, 2025

Fix intan `one-file-per-channel` #1601

Fix intan `one-file-per-channel` #1601

h-mayorquin commented Nov 28, 2024 •

edited

Loading

h-mayorquin commented Dec 4, 2024 •

edited

Loading

h-mayorquin commented Dec 5, 2024 •

edited

Loading

h-mayorquin commented Dec 5, 2024 •

edited

Loading