Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix intan one-file-per-channel #1601

Merged

Conversation

h-mayorquin
Copy link
Contributor

@h-mayorquin h-mayorquin commented Nov 28, 2024

Fix #1599

The fix is composed of two steps:

  1. Use "stream_name" as a reference to the channel maps instead of the stream_index
  2. Sort the file names of in the one-file-per-channel case based on their channel names.

The first point also has the advantage that it removes the need to keep yet another numeric reference (that is not stream_id or stream_index) for the different data structures.

@zm711
Copy link
Contributor

zm711 commented Dec 2, 2024

Love the use of the dicts instead of the lists. I'm busy for the next couple days, but since this ~200 lines of change I'll try to carefully read this at some point this week so we can add this in. Unless Sam has the bandwidth before me :)

@h-mayorquin
Copy link
Contributor Author

Thanks, @zm711

Copy link
Contributor

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only question is how do I know which stream index to use (ie if I as a previous user < 14.0 if this PR makes it....)? Like if I were stream_index 15 in my code previously would this be a breaking change? I can test if needed.

Otherwise I think this looks really great. A lot better--definitely more robust.

neo/rawio/intanrawio.py Outdated Show resolved Hide resolved
neo/rawio/intanrawio.py Outdated Show resolved Hide resolved
neo/rawio/intanrawio.py Show resolved Hide resolved
neo/rawio/intanrawio.py Show resolved Hide resolved

# we need time to be the last value
raw_file_paths_dict[15] = [Path(dirname / "time.dat")]
raw_file_paths_dict["timestamp"] = [Path(dirname / "time.dat")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment can be deleted above here since last value means nothing now :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

neo/rawio/intanrawio.py Show resolved Hide resolved
@h-mayorquin
Copy link
Contributor Author

h-mayorquin commented Dec 4, 2024

Thanks @zm711 . I implemented your feedback.

My only question is how do I know which stream index to use (ie if I as a previous user < 14.0 if this PR makes it....)? Like if I were stream_index 15 in my code previously would this be a breaking change? I can test if needed.

Good point. My impression is that there is no contract in neo that stream indices have a fixed meaning. Ids and Names should be fixed but the stream indices depend on the specific instance of the format. Some context:

  1. Code like this in spikeinterface and here in neo where the index is derived from the name:
    https://github.com/SpikeInterface/spikeinterface/blob/853d8a48abcf40f5a47401ed2f7baa83e1c3295f/src/spikeinterface/extractors/neoextractors/neobaseextractor.py#L224-L225

  2. If the indices are meant to be fixed, why do we need ids?

  3. For Plexon, some streams are unavailable in some cases. At least in that case, stream_index does not have a fixed meaning.

All that said, it is unfortunate that the raw API for neo exposes the stream_index as the primary access method. My impression is that the normal io of neo and spikeinterface were the means to access the raw API and that the raw.io API was not designed to be relied on directly. My opinion is that either exposing stream_index directly or coupling it to the signal_streams structure was a mistake that was done because segment index and block index have natural order semantics. If this had been done with stream_id instead the following code (often used) would be unnecessary:

    def _get_analogsignal_chunk(
        self,
        block_index: int,
        seg_index: int,
        i_start: int | None,
        i_stop: int | None,
        stream_index: int,
        channel_indexes: list[int] | None,
    ):

        stream_id = self.header["signal_streams"][stream_index]["id"]
        buffer_id = self.header["signal_streams"][stream_index]["buffer_id"]
    def channel_id_to_index(self, stream_index: int, channel_ids: list[str]):
        """
        Inside a stream, transform channel_ids to channel_indexes.
        Based on self.header['signal_channels']
        channel_indexes are zero-based offsets within the stream

        Parameters
        ----------
        stream_index: int
            the stream index in which to convert the channel_ids to channel_indexes
        channel_ids: list[str]
            the list of channel_ids to convert to channel_indexes

        Returns
        -------
        channel_indexes: np.array[int]
             the channel_indexes associated with the given channel_ids
        """
        # unique ids is already checked in _check_stream_signal_channel_characteristics
        stream_id = self.header["signal_streams"][stream_index]["id"]
        mask = self.header["signal_channels"]["stream_id"] == stream_id

Yes, maybe something to discuss with the team. The thorny issue is that code like this is commonly use to get the stream_index:

    def test_sync(self):
        rawio_with_sync = SpikeGLXRawIO(self.get_local_path("spikeglx/NP2_with_sync"), load_sync_channel=True)
        rawio_with_sync.parse_header()
        stream_index = list(rawio_with_sync.header["signal_streams"]["name"]).index("imec0.ap")

It is hard to square that use with a variable number of streams (as in Plexon).

@zm711
Copy link
Contributor

zm711 commented Dec 5, 2024

Thanks @zm711 . I implemented your feedback.

Thanks yeah. I agree the design is a little tricky since the rawio exposes stream_index at the public level. I guess no particular stream_index is guaranteed, but in the case of Intan we have used the stream_index has a fossilized thing for so long (and I would say spikeglx, openephy, neuralynx, intan, and plexon are the most common ios we have used) that even though we don't have an official contract it might be disruptive to make a public api level change just in this case. Are you okay if we run this by Sam or the whole team before merging?

I guess to summarize my problem is that we don't actually have enough constraints on how to build a rawio and so different ios were build differently. I think this is a great improvement overall and will reduce mistakes. Could you run the intan test suite from spikeinterface on this branch and double check that it converts everything appropriately to make sure at the SI level we didn't make any assumptions :)

@h-mayorquin
Copy link
Contributor Author

so long (and I would say spikeglx, openephy, neuralynx, intan, and plexon are the most common ios we have used) that even though we don't have an official contract it might be disruptive to make a public api level change just in this case. Are you okay if we run this by Sam or the whole team before merging?

Sure, but we can also just run the index on master right now and re-order the signal stream header so it matches what we had before. Could you check that? I can check the Spikeinterface but I think that over there we don't rely on stream index.

@h-mayorquin
Copy link
Contributor Author

h-mayorquin commented Dec 5, 2024

In fact, let's do just the re-ordering and then we don't need to check anything else as that would provide backward compatibility with what you had before. I will take care of it.

@h-mayorquin
Copy link
Contributor Author

h-mayorquin commented Dec 5, 2024

Oh, in fact we are already sorting so this does should keep that behavior:

# we need to sort the data because the string of stream_index 10 is mis-sorted.
buffer_ids = []
stream_ids_sorted = sorted([int(stream_id) for stream_id in stream_ids])
signal_streams["id"] = [str(stream_id) for stream_id in stream_ids_sorted]
for stream_index, stream_id in enumerate(stream_ids_sorted):
if self.filename.suffix == ".rhd":
name = stream_id_to_stream_name_rhd.get(int(stream_id), "")
else:
name = stream_id_to_stream_name_rhs.get(int(stream_id), "")
signal_streams["name"][stream_index] = name
# zach I need you help here
if self.file_format == "header-attached":
buffer_id = ""
elif self.file_format == "one-file-per-signal":
buffer_id = stream_id
buffer_ids.append(buffer_id)
elif self.file_format == "one-file-per-channel":
buffer_id = ""
signal_streams["buffer_id"][stream_index] = buffer_id

I also double-checked the order of the streams on the signals header (which is what determines stream_index) and found no change between master and this PR. Here is the code and output:

from pathlib import Path
import subprocess
from neo.rawio import IntanRawIO
from pprint import pprint
import time 

neo_test_dir = Path("/home/heberto/ephy_testing_data/intan")
intan_test_cases_folder_paths = [p for p in neo_test_dir.iterdir() if p.is_dir()]

for test_case_folder_path in intan_test_cases_folder_paths:
    info_rhd_path = test_case_folder_path / "info.rhd"
    info_rhs_path = test_case_folder_path / "info.rhs"
    
    if info_rhd_path.exists() or info_rhs_path.exists():
        filename = info_rhd_path if info_rhd_path.exists() else info_rhs_path
        print("--------------------")
        print(test_case_folder_path.name)
        print(filename.suffix)
        # Switch to the master branch
        
        subprocess.run(["git", "switch", "master"], check=True)
        time.sleep(1) # Just so the print works
        reader = IntanRawIO(filename=filename)
        reader.parse_header()
        pprint(reader.header["signal_streams"])
        
        subprocess.run(["git", "switch", "fix_intan_one_file_per_channel"], check=True)
        time.sleep(1) # Just so the print works

        reader = IntanRawIO(filename=filename)
        reader.parse_header()
        pprint(reader.header["signal_streams"])

--------------------
rhd_fpc_multistim_240514_082044
.rhd
Your branch is up to date with 'origin/master'.
Already on 'master'
array([('RHD2000 amplifier channel', '0', ''),
       ('RHD2000 auxiliary input channel', '1', ''),
       ('USB board ADC input channel', '3', ''),
       ('USB board digital input channel', '4', ''),
       ('USB board digital output channel', '5', '')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
Your branch is up to date with 'origin/fix_intan_one_file_per_channel'.
Switched to branch 'fix_intan_one_file_per_channel'
array([('RHD2000 amplifier channel', '0', ''),
       ('RHD2000 auxiliary input channel', '1', ''),
       ('USB board ADC input channel', '3', ''),
       ('USB board digital input channel', '4', ''),
       ('USB board digital output channel', '5', '')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
--------------------
intan_fps_test_231117_052500
.rhd
Your branch is up to date with 'origin/master'.
Switched to branch 'master'
array([('RHD2000 amplifier channel', '0', '0'),
       ('RHD2000 auxiliary input channel', '1', '1'),
       ('USB board ADC input channel', '3', '3'),
       ('USB board digital input channel', '4', '4')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
Your branch is up to date with 'origin/fix_intan_one_file_per_channel'.
Switched to branch 'fix_intan_one_file_per_channel'
array([('RHD2000 amplifier channel', '0', '0'),
       ('RHD2000 auxiliary input channel', '1', '1'),
       ('USB board ADC input channel', '3', '3'),
       ('USB board digital input channel', '4', '4')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
--------------------
intan_fpc_test_231117_052630
.rhd
Your branch is up to date with 'origin/master'.
Switched to branch 'master'
array([('RHD2000 amplifier channel', '0', ''),
       ('RHD2000 auxiliary input channel', '1', ''),
       ('USB board ADC input channel', '3', ''),
       ('USB board digital input channel', '4', '')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
Your branch is up to date with 'origin/fix_intan_one_file_per_channel'.
Switched to branch 'fix_intan_one_file_per_channel'
array([('RHD2000 amplifier channel', '0', ''),
       ('RHD2000 auxiliary input channel', '1', ''),
       ('USB board ADC input channel', '3', ''),
       ('USB board digital input channel', '4', '')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
--------------------
intan_fps_rhs_test_240329_091536
.rhs
Your branch is up to date with 'origin/master'.
Switched to branch 'master'
array([('RHS2000 amplifier channel', '0', '0'),
       ('USB board ADC input channel', '3', '3'),
       ('USB board ADC output channel', '4', '4'),
       ('USB board digital input channel', '5', '5'),
       ('USB board digital output channel', '6', '6'),
       ('DC Amplifier channel', '10', '10'), ('Stim channel', '11', '11')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
Your branch is up to date with 'origin/fix_intan_one_file_per_channel'.
Switched to branch 'fix_intan_one_file_per_channel'
array([('RHS2000 amplifier channel', '0', '0'),
       ('USB board ADC input channel', '3', '3'),
       ('USB board ADC output channel', '4', '4'),
       ('USB board digital input channel', '5', '5'),
       ('USB board digital output channel', '6', '6'),
       ('DC Amplifier channel', '10', '10'), ('Stim channel', '11', '11')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
--------------------
intan_fpc_rhs_test_240329_091637
.rhs
Your branch is up to date with 'origin/master'.
Switched to branch 'master'
/home/heberto/development/python-neo/neo/rawio/intanrawio.py:826: UserWarning: Stim not implemented for `one-file-per-channel` due to lack of test files
  warnings.warn("Stim not implemented for `one-file-per-channel` due to lack of test files")
Switched to branch 'fix_intan_one_file_per_channel'
array([('RHS2000 amplifier channel', '0', ''),
       ('USB board ADC input channel', '3', ''),
       ('USB board ADC output channel', '4', ''),
       ('USB board digital input channel', '5', ''),
       ('USB board digital output channel', '6', ''),
       ('DC Amplifier channel', '10', '')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
Your branch is up to date with 'origin/fix_intan_one_file_per_channel'.
array([('RHS2000 amplifier channel', '0', ''),
       ('USB board ADC input channel', '3', ''),
       ('USB board ADC output channel', '4', ''),
       ('USB board digital input channel', '5', ''),
       ('USB board digital output channel', '6', ''),
       ('DC Amplifier channel', '10', '')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('buffer_id', '<U64')])
/home/heberto/development/python-neo/neo/rawio/intanrawio.py:826: UserWarning: Stim not implemented for `one-file-per-channel` due to lack of test files
  warnings.warn("Stim not implemented for `one-file-per-channel` due to lack of test files")

Copy link
Contributor

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this then. I'll ping Sam to look at our discussion, but as long as this maintains behavior then the discussion is more for future development than this PR. Thanks Heberto for putting up with me :)

@h-mayorquin
Copy link
Contributor Author

I branched the index discussion as a separate issue because it seems like an important thing:

#1604

@zm711 zm711 added this to the 0.14.0 milestone Dec 6, 2024
Copy link
Contributor

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment then ready to merge @h-mayorquin .

Comment on lines 82 to 87
channel_names = [p.name[4:-4] for p in amplifier_file_paths]

for channel_name, amplifier_file_path in zip(channel_names, amplifier_file_paths):
data_raw = np.fromfile(amplifier_file_path, dtype=np.int16).squeeze()
data_from_neo = intan_reader.get_analogsignal_chunk(channel_ids=[channel_name], stream_index=0).squeeze()
np.testing.assert_allclose(data_raw, data_from_neo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get a note in comment about why this works...

so p.name[4:-4] means nothing to someone who doesn't know the file format. So I would explain why 4:-4. And if you're feeling ambitious maybe a better name than p (since I know a certain developer who hates one letter variable names....).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I was lazy here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@zm711 zm711 self-assigned this Dec 13, 2024
@zm711
Copy link
Contributor

zm711 commented Dec 13, 2024

Sorry I merged main into the branch, but this bug that is fixed on main is now popping up on RTD here. It appeared on the PR we just merged. Sorry let me try to track it down.

@h-mayorquin
Copy link
Contributor Author

git log revert?

@zm711
Copy link
Contributor

zm711 commented Dec 13, 2024

It broke your other PR too. So I would love to forget it out. You want to look at the log and see if you see anything ? To me it looks like the code should be fine, so I don't know why it is using an old version of the plexon code.... I've been googling to see if there is a cache I can empty but seems like RTD doesn't make it easy.

@zm711
Copy link
Contributor

zm711 commented Dec 13, 2024

Okay I found another Plexon overflow that looked the same as the last one so I didn't even realize it was a different one which had messed with me. Once we test and merge #1612 we can merge main into your PRs and that should fix everything :)

@h-mayorquin
Copy link
Contributor Author

I guessed that Pleon will bite you, one way or the other.

@zm711 zm711 merged commit 45ec350 into NeuralEnsemble:master Dec 14, 2024
3 checks passed
@samuelgarcia
Copy link
Contributor

Hi Ramon
This is a big clean. thank you.
OK for dict approach.

@h-mayorquin h-mayorquin deleted the fix_intan_one_file_per_channel branch January 7, 2025 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intan "one-file-per-channel-map" does not map channel to file names correctly
3 participants