Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug fix] Fix bug of synchrotool index out of bounds #612

Closed

Conversation

morales-gregorio
Copy link
Collaborator

@morales-gregorio morales-gregorio commented Dec 26, 2023

Hi,

The bug was a known issue, when applying the synchrotool if there was a synchrofact in the last bin the indices went out of bounds. Here a minimal code to reproduce the error:

import neo
import numpy as np
import quantities as pq
from elephant.spike_train_synchrony import Synchrotool

sampling_rate = 1/pq.ms
st = neo.SpikeTrain(np.arange(0, 11)*pq.ms, t_start=0*pq.ms, t_stop=10*pq.ms)

synchrotool_instance = Synchrotool([st, st], sampling_rate, spread=0)
synchrotool_instance.annotate_synchrofacts()

which (with the master branch elephant, python 3.9.7) produces the output:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[2], line 5
      2 st = neo.SpikeTrain(np.arange(0, 11)*pq.ms, t_start=0*pq.ms, t_stop=10*pq.ms)
      4 synchrotool_instance = Synchrotool([st, st], sampling_rate, spread=0)
----> 5 synchrotool_instance.annotate_synchrofacts()

File elephant/elephant/spike_train_synchrony.py:414, in Synchrotool.annotate_synchrofacts(self)
    406 for idx, st in enumerate(self.input_spiketrains):
    407 
    408     # all indices of spikes that are within the half-open intervals
    409     # defined by the boundaries
    410     # note that every second entry in boundaries is an upper boundary
    411     spike_to_epoch_idx = np.searchsorted(
    412         right_edges,
    413         st.times.rescale(self.epoch.times.units).magnitude.flatten())
--> 414     complexity_per_spike = epoch_complexities[spike_to_epoch_idx]
    416     st.array_annotate(complexity=complexity_per_spike)
    418 self.annotated = True

IndexError: index 10 is out of bounds for axis 0 with size 10

The following is outdated
To fix this issue I suggest a very simple change in the definition of the bins. To avoid spikes at bin edges (when binning at sampling resolution) the bins were shifted by half a bin width to the left. This leads to one additional (half)-bin at the end of the time series, which only raises the issue when a synchrofact is found in this last bin, which is only rarely in real data. Instead of shifting to the left, I propose shifting to the right, this leads to two initial spikes being in the first bin, but no extra bin at the end.

Let me know what you think,
Aitor

PS: I see some of the tests fail with this change due to small differences, let me know if this change is ok and I will fix the tests

@Moritz-Alexander-Kern
Copy link
Member

Hey Aitor,
thanks for suggesting those changes and creating a pull request. 🎉

This seems to be similar to the discussion we had on this issue, see #493 .

The question seems to be whether we agree on those changes with respect to the definition of the bins, the code looks good to me.

Regarding the small differences in the results which lead to failing unit tests: we need to discuss whether those differences are significant and pose a problem with respect to reproducibility of previous results calculated with the "old" way of defining the bins.

@mdenker mdenker added this to the v1.1.0 milestone Jan 17, 2024
@Moritz-Alexander-Kern
Copy link
Member

Moritz-Alexander-Kern commented Feb 5, 2024

… only appears in synchrotool and binnedspiketrains are used in many other places
Co-authored by: skrausse <[email protected]>
Co-authored by: jo460464
@morales-gregorio
Copy link
Collaborator Author

As discussed in the meeting on 2024.02.05, BinnedSpiketrain is not the correct place to fix this since it can have unexpected ramifications. Instead the Fix by @skrausse and @jo460464 was transferred to this PR

@Moritz-Alexander-Kern Moritz-Alexander-Kern modified the milestones: v1.1.0, v1.2.0 Mar 25, 2024
@Moritz-Alexander-Kern
Copy link
Member

Hey @morales-gregorio ,
thanks for the contribution.

This is still open?

PS: I see some of the tests fail with this change due to small differences, let me know if this change is ok and I will fix the tests

@morales-gregorio
Copy link
Collaborator Author

Hi, yes I am looking at the tests now

@morales-gregorio
Copy link
Collaborator Author

Turns out there is a very good reason why those tests were failing. Shifting the bins rightward causes unpredictable behavior when there are coincident spikes in the first bin, which messed with the output of the complexity histogram, annotation etc.

I propose a new simplified method, by extending the t_stop by one bin (this does not modify the t_stop of the original spiketrains!). This solves the problem without all the additional weirdness around the first bin.

@Moritz-Alexander-Kern Moritz-Alexander-Kern self-requested a review April 4, 2024 12:21
@morales-gregorio
Copy link
Collaborator Author

As just discussed with @Moritz-Alexander-Kern the neo version mattered for the SpikeTrainList indexing, I have now therefore updated the requirements to use neo>=0.13.0.

There seems to be a separate issue that mamba is heavily outdated:

package mamba-0.0.10-py27h770b8ee_1 requires python >=2.7,<2.8.0a0, but none of the providers can be installed

@pep8speaks
Copy link

pep8speaks commented Apr 4, 2024

Hello @morales-gregorio! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-04-25 13:17:51 UTC

@Moritz-Alexander-Kern Moritz-Alexander-Kern added the bugfix Fix for an indentified bug. label Apr 4, 2024
@coveralls
Copy link
Collaborator

coveralls commented Apr 5, 2024

Coverage Status

coverage: 88.221% (-0.04%) from 88.264%
when pulling 3c25002 on morales-gregorio:synchrotool_fix
into 7d85845 on NeuralEnsemble:master.

@morales-gregorio
Copy link
Collaborator Author

Hi @Moritz-Alexander-Kern the tests are now all passing, something missing for merge?

@Moritz-Alexander-Kern
Copy link
Member

Hi @morales-gregorio,

Hi @Moritz-Alexander-Kern the tests are now all passing, something missing for merge?

Thank you for the update. With regards to your question about the merge, everything looks good on the testing front – all tests are passing. However, before proceeding with the merge, there are a couple of points:

Approach Change:
We need to re-review, since the last review, the approach was changed:

I propose a new simplified method, by extending the t_stop by one bin (this does not modify the t_stop of the original > spiketrains!). This solves the problem without all the additional weirdness around the first bin.

The previous understanding regarding the approach no longer stands, as the fix by @skrausse and @jo460464 is no longer utilized here. i.e. this no longer holds true:

Instead the Fix by @skrausse and @jo460464 was transferred to this PR

What was discussed regarding this PR is therefore also outdated, it seems only fair to update the involved people on the new status and allow them to update their comments. (The PR description is also outdated)

Thanks again for your attention to this matter and for the progress made thus far.

Copy link
Member

@Moritz-Alexander-Kern Moritz-Alexander-Kern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @morales-gregorio,

thanks again for your work.

I added some print statements to the code example supplied in the PR description:

import neo
import numpy as np
import quantities as pq
from elephant.spike_train_synchrony import Synchrotool

sampling_rate = 1/pq.ms
st = neo.SpikeTrain(np.arange(0, 11)*pq.ms, t_start=0*pq.ms, t_stop=10*pq.ms)
print(f"Spike train before Synchrotool: {st}")
print(f"Spike train t_stop before Synchrotool: {st.t_stop}")

synchrotool_instance = Synchrotool([st, st], sampling_rate, spread=0)
synchrotool_instance.annotate_synchrofacts()

print(f"Spike train after Synchrotool: {st}")
print(f"Spike train t_stop after Synchrotool: {st.t_stop}")

I get the following output:

Spike train before Synchrotool: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.] ms
Spike train t_stop before Synchrotool: 10.0 ms
Spike train after Synchrotool: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.] ms
Spike train t_stop after Synchrotool: 11.0 ms

Is this intentional? I anticipated that the input spike trains would remain unchanged, the spike times are, but t_stop seems to be changed.

Note
I did not take the general approach to change the behavior of the Complexity- class into account with this review.

@morales-gregorio
Copy link
Collaborator Author

Hey @morales-gregorio,

thanks again for your work.

I added some print statements to the code example supplied in the PR description:

import neo
import numpy as np
import quantities as pq
from elephant.spike_train_synchrony import Synchrotool

sampling_rate = 1/pq.ms
st = neo.SpikeTrain(np.arange(0, 11)*pq.ms, t_start=0*pq.ms, t_stop=10*pq.ms)
print(f"Spike train before Synchrotool: {st}")
print(f"Spike train t_stop before Synchrotool: {st.t_stop}")

synchrotool_instance = Synchrotool([st, st], sampling_rate, spread=0)
synchrotool_instance.annotate_synchrofacts()

print(f"Spike train after Synchrotool: {st}")
print(f"Spike train t_stop after Synchrotool: {st.t_stop}")

I get the following output:

Spike train before Synchrotool: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.] ms
Spike train t_stop before Synchrotool: 10.0 ms
Spike train after Synchrotool: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.] ms
Spike train t_stop after Synchrotool: 11.0 ms

Is this intentional? I anticipated that the input spike trains would remain unchanged, the spike times are, but t_stop seems to be changed.

Note I did not take the general approach to change the behavior of the Complexity- class into account with this review.

Hi, I thought this did not happen but I also now reproduced it in my implementation. I agree this could lead to some confusion.

The options I suggest are either:

  1. Work exclusively on a copy of the spiketrains, this will double the memory usage, which was a concern of this implementation.
  2. Keep it as is but raise a warning when t_stop is modified, so the users know that it has happened.

I lean towards option 2, since the extension of t_stop by 1 time bin is minimal when working with long recordings.

@pbouss
Copy link
Contributor

pbouss commented Jul 8, 2024

Hi @morales-gregorio and @Moritz-Alexander-Kern,
can this fix be merged to elephant?
It would help me for one of me projects.
Thanks, Peter

@Moritz-Alexander-Kern
Copy link
Member

Moritz-Alexander-Kern commented Jul 9, 2024

Hi @pbouss,

Could you please clarify which of the proposed fixes (A, B, or C) you believe would be most beneficial for your project?

A) The approach suggested here to modify t_stop of the spike train (This PR).

B) or C) Proposed fix: Condition on binned spiketrain – if the sampling rate is provided and the shift parameter is allowed, then shift the spiketrains; otherwise, warn about spikes being removed if they coincide with t_stop.

Originally posted by @skrausse in #493 (comment)

It's important to note that the underlying issue is not a bug with the code per se, but rather a decision regarding how to handle situations where the spike trains do not conform to the assumptions made by the author of the synchrotool regarding the input spike trains. Workarounds could include checking spike trains in advance and possibly changing t_stop before using synchrotool or removing/masking spikes.

Please also read #493, where this was discussed previously and the author @Kleinjohann suggested how to handle this issue with synchrotool.

The "fix" might be different depending on your use-case; please understand that this is why there is no straightforward solution.

@Moritz-Alexander-Kern
Copy link
Member

Moritz-Alexander-Kern commented Jul 10, 2024

Consider the following example:

import quantities as pq
from elephant import spike_train_synchrony

import elephant.spike_train_generation as stg
import numpy as np

np.random.seed(1)  # set random seed 2 for no error

SPP = stg.StationaryPoissonProcess(50 * pq.Hz, t_start=0 * pq.ms,
                                   t_stop=1000*pq.ms)
test_sts = SPP.generate_n_spiketrains(2)

sampling_rate = 1/pq.ms

test_obj = spike_train_synchrony.Synchrotool(
    test_sts, sampling_rate=sampling_rate, spread=2)
test_obj.annotate_synchrofacts()

With the proposed "fix" I still get:

Traceback (most recent call last):
  File "git/INM-6/elephant/build/test2_612_snchrotool_index_out_of_bounds.py", line 20, in <module>
    test_obj.annotate_synchrofacts()
  File "git/INM-6/elephant/elephant/spike_train_synchrony.py", line 410, in annotate_synchrofacts
    complexity_per_spike = epoch_complexities[spike_to_epoch_idx]
IndexError: index 72 is out of bounds for axis 0 with size 72

So does this PR fix your issue @pbouss ? Maybe you can test this approach by installing from this branch with:

pip install git+https://github.com/morales-gregorio/elephant.git@synchrotool_fix

@pbouss
Copy link
Contributor

pbouss commented Jul 10, 2024

Consider the following example:

import quantities as pq
from elephant import spike_train_synchrony

import elephant.spike_train_generation as stg
import numpy as np

np.random.seed(1)  # set random seed 2 for no error

SPP = stg.StationaryPoissonProcess(50 * pq.Hz, t_start=0 * pq.ms,
                                   t_stop=1000*pq.ms)
test_sts = SPP.generate_n_spiketrains(2)

sampling_rate = 1/pq.ms

test_obj = spike_train_synchrony.Synchrotool(
    test_sts, sampling_rate=sampling_rate, spread=2)
test_obj.annotate_synchrofacts()

With the proposed fix I still get:

Traceback (most recent call last):
  File "git/INM-6/elephant/build/test2_612_snchrotool_index_out_of_bounds.py", line 20, in <module>
    test_obj.annotate_synchrofacts()
  File "git/INM-6/elephant/elephant/spike_train_synchrony.py", line 410, in annotate_synchrofacts
    complexity_per_spike = epoch_complexities[spike_to_epoch_idx]
IndexError: index 72 is out of bounds for axis 0 with size 72

So does this PR fix your issue @pbouss ? Maybe you can test this approach by installing from this branch with:

pip install git+https://github.com/morales-gregorio/elephant.git@synchrotool_fix

Hi @Moritz-Alexander-Kern, thanks for your message. I understand a bit more why there are so many options.
I'm already happy with what is done here in this branch. I used this now for the SPADE analysis work flow (as you proposed) and there it was running without any error.
It's strange that it still does not work for the example you show here. Do you know why the fix a) does not work in this case here?

@Moritz-Alexander-Kern
Copy link
Member

Moritz-Alexander-Kern commented Jul 10, 2024

Hey,

Great to hear this fixes your issue! 🎉

Regarding the example I provided, @Kleinjohann pointed out that the spiketrains lack a sampling rate. Synchrotool assumes that spiketrains are discrete and that all spike times are multiples of the provided sampling rate. The suggested fix is to discretize the spiketrains before using Synchrotool:

from elephant.conversion import discretise_spiketimes
test_sts = discretise_spiketimes(test_sts, sampling_rate)

But you can also "fix" the example I provided by changing the sampling rate to sampling_rate = 10/pq.ms or by setting spread=2.

There seem to be multiple ways this issue can arise and several potential solutions, depending on which assumptions about the input spiketrains are not met.

So far, the proposed fixes involve modifying the spiketrains, such as discretizing them or changing t_stop, to align with Synchrotool's assumptions.

This raises a question: should Synchrotool handle these cases, or is it the user's responsibility to ensure the spiketrains comply with Synchrotool's assumptions?

The fix proposed here could also be implemented manually, or is there a reason why this must be implemented in Synchrotool ?
Pseudo Code:

# Check if spikes happen in the last bin
for st in input_spiketrains:
     # Extend t_stop to avoid indexing problems
     if last spike is close to t_stop
         st.t_stop += st.bin_size

now call Synchrotool

Now, let me know: Is there a bug in Synchrotool, or are the input spike trains incorrect?

@pbouss
Copy link
Contributor

pbouss commented Jul 10, 2024

Hey,

Great to hear this fixes your issue! 🎉

Regarding the example I provided, @Kleinjohann pointed out that the spiketrains lack a sampling rate. Synchrotool assumes that spiketrains are discrete and that all spike times are multiples of the provided sampling rate. The suggested fix is to discretize the spiketrains before using Synchrotool:

from elephant.conversion import discretise_spiketimes
test_sts = discretise_spiketimes(test_sts, sampling_rate)

But you can also "fix" the example I provided by changing the sampling rate to sampling_rate = 10/pq.ms or by setting spread=2.

There seem to be multiple ways this issue can arise and several potential solutions, depending on which assumptions about the input spiketrains are not met.

So far, the proposed fixes involve modifying the spiketrains, such as discretizing them or changing t_stop, to align with Synchrotool's assumptions.

This raises a question: should Synchrotool handle these cases, or is it the user's responsibility to ensure the spiketrains comply with Synchrotool's assumptions?

The fix proposed here could also be implemented manually, or is there a reason why this must be implemented in Synchrotool ? Pseudo Code:

# Check if spikes happen in the last bin
for st in input_spiketrains:
     # Extend t_stop to avoid indexing problems
     if last spike is close to t_stop
         st.t_stop += st.bin_size

Now, let me know: Is there a bug in Synchrotool, or are the input spike trains incorrect?

In my case, I'm pretty sure the input spike trains are correct and they should also have a sampling_rate, as it's based on the R2G-experiment which has a sampling rate.
In general, I should go for Synchrotool should always give you a reasonable output or at least it gives the user an error message which is solvable. But in my case, it was giving in 3 of 20 sessions the IndexError, where there was the same code for all the sessions that is then much better to have the slight change in t_stop and everything runs smoothly.

@Moritz-Alexander-Kern
Copy link
Member

Moritz-Alexander-Kern commented Jul 10, 2024

ahh ok I see.
Which of the suggested options from morales-gregorio do you prefer? (I guess Option 2?)

Hi, I thought this did not happen but I also now reproduced it in my implementation. I agree this could lead to some confusion.

The options I suggest are either:

  1. Work exclusively on a copy of the spiketrains, this will double the memory usage, which was a concern of this implementation.

  2. Keep it as is but raise a warning when t_stop is modified, so the users know that it has happened.

I lean towards option 2, since the extension of t_stop by 1 time bin is minimal when working with long recordings.

@pbouss
Copy link
Contributor

pbouss commented Jul 10, 2024

ahh ok I see. Which of the suggested options from morales-gregorio do you prefer? (I guess Option 2?)

Hi, I thought this did not happen but I also now reproduced it in my implementation. I agree this could lead to some confusion.
The options I suggest are either:

  1. Work exclusively on a copy of the spiketrains, this will double the memory usage, which was a concern of this implementation.
  2. Keep it as is but raise a warning when t_stop is modified, so the users know that it has happened.

I lean towards option 2, since the extension of t_stop by 1 time bin is minimal when working with long recordings.

I think the second option is better, the change is minimal, especially for the bins considered in the Synchrotool and the warning gives you the chance to make your own fix if you like.

@Moritz-Alexander-Kern
Copy link
Member

I see, thanks for the feedback.

Further, I suggest to also consider the fix suggested by @jo460464 and @skrausse , see: https://github.com/jo460464/elephant/tree/enh/iss101

Install with: (make sure you use scipy<1.14.0)

pip install git+https://github.com/jo460464/elephant.git@enh/iss101

This approach does not change the spike trains but might also work for you?

@Moritz-Alexander-Kern
Copy link
Member

As discussed in the meeting on 2024.02.05, BinnedSpiketrain is not the correct place to fix this since it can have unexpected ramifications. Instead the Fix by @skrausse and @jo460464 was transferred to this PR

I believe this was the fix originally intended to be implemented with this PR.

@pbouss
Copy link
Contributor

pbouss commented Jul 10, 2024

I see, thanks for the feedback.

Further, I suggest to also consider the fix suggested by @jo460464 and @skrausse , see: https://github.com/jo460464/elephant/tree/enh/iss101

Install with: (make sure you use scipy<1.14.0)

pip install git+https://github.com/jo460464/elephant.git@enh/iss101

This approach does not change the spike trains but might also work for you?

Thank you. I think for now I'm fine with what I have working. For the next iteration, I will also include this.

@Moritz-Alexander-Kern Moritz-Alexander-Kern removed this from the v1.2.0 milestone Jul 17, 2024
@Moritz-Alexander-Kern
Copy link
Member

Closed in favorite of #637

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Fix for an indentified bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants