Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loosing performance somewhere #23

Closed
djanloo opened this issue Jul 3, 2024 · 4 comments
Closed

Loosing performance somewhere #23

djanloo opened this issue Jul 3, 2024 · 4 comments
Labels
invalid This doesn't seem right technical Concerns the code, not the physics

Comments

@djanloo
Copy link
Owner

djanloo commented Jul 3, 2024

This is the old performance on the basal ganglia benchmark:

Building connections.. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:04
Running network consisting of 14622 neurons for 6000 timesteps
--------------------------------------------------
**************************************************
Simulation took 8 s	(1.39433 ms/step)
	Gathering time avg: 0 us/step
	Inject time avg: 147.633 us/step
Population evolution stats:
	0:
		evolution:	380.659 us/step	---	63 ns/step/neuron
		spike emission:	34.1322 us/step	---	5 ns/step/neuron
	1:
		evolution:	373.055 us/step	---	62 ns/step/neuron
		spike emission:	32.8957 us/step	---	5 ns/step/neuron
	2:
		evolution:	70.5442 us/step	---	167 ns/step/neuron
		spike emission:	9.53733 us/step	---	22 ns/step/neuron
	3:
		evolution:	97.5183 us/step	---	125 ns/step/neuron
		spike emission:	17.2243 us/step	---	22 ns/step/neuron
	4:
		evolution:	53.9407 us/step	---	207 ns/step/neuron
		spike emission:	2.33867 us/step	---	8 ns/step/neuron
	5:
		evolution:	62.5945 us/step	---	153 ns/step/neuron
		spike emission:	3.75417 us/step	---	9 ns/step/neuron
	6:
		evolution:	96.7672 us/step	---	128 ns/step/neuron
		spike emission:	3.33033 us/step	---	4 ns/step/neuron

And this is the performance on the same benchmark now:

[2024-07-03 18:51:31] - PID 124110237631616 - INFO: Evolving spiking network from t= 0 to t= 600
Running network consisting of 14622 neurons for 6000 timesteps
--------------------------------------------------
**************************************************
[2024-07-03 18:51:45] - PID 124110237631616 - INFO: Output for PerformanceManager <spiking network>
        --inject        1.0 s   --  -- 237.0 us /step for 6000 steps --
        --monitorize    374.0 us        --  -- 62.0 ns /step for 6000 steps --
        --simulation    14.0 s

[2024-07-03 18:51:45] - PID 124110237631616 - INFO: Output for PerformanceManager <Population 0>
        --evolution     4.0 s   --  -- 697.0 us /step for 6000 steps -- -- 116.0 ns /step/unit for 6000 steps and 6000 units
        --spike_emission        339.0 ms        --  -- 56.0 us /step for 6000 steps --  -- 9.0 ns /step/unit for 6000 steps and 6000 units

[2024-07-03 18:51:45] - PID 124110237631616 - INFO: Output for PerformanceManager <Population 1>
        --evolution     3.0 s   --  -- 659.0 us /step for 6000 steps -- -- 109.0 ns /step/unit for 6000 steps and 6000 units
        --spike_emission        355.0 ms        --  -- 59.0 us /step for 6000 steps --  -- 9.0 ns /step/unit for 6000 steps and 6000 units

[2024-07-03 18:51:45] - PID 124110237631616 - INFO: Output for PerformanceManager <Population 2>
        --evolution     714.0 ms        --  -- 119.0 us /step for 6000 steps -- -- 283.0 ns /step/unit for 6000 steps and 420 units
        --spike_emission        95.0 ms --  -- 15.0 us /step for 6000 steps --  -- 37.0 ns /step/unit for 6000 steps and 420 units

[2024-07-03 18:51:45] - PID 124110237631616 - INFO: Output for PerformanceManager <Population 3>
        --evolution     942.0 ms        --  -- 157.0 us /step for 6000 steps -- -- 201.0 ns /step/unit for 6000 steps and 780 units
        --spike_emission        173.0 ms        --  -- 28.0 us /step for 6000 steps --  -- 37.0 ns /step/unit for 6000 steps and 780 units

[2024-07-03 18:51:45] - PID 124110237631616 - INFO: Output for PerformanceManager <Population 4>
        --evolution     468.0 ms        --  -- 78.0 us /step for 6000 steps --  -- 300.0 ns /step/unit for 6000 steps and 260 units
        --spike_emission        25.0 ms --  -- 4.0 us /step for 6000 steps --   -- 16.0 ns /step/unit for 6000 steps and 260 units

[2024-07-03 18:51:45] - PID 124110237631616 - INFO: Output for PerformanceManager <Population 5>
        --evolution     593.0 ms        --  -- 98.0 us /step for 6000 steps --  -- 242.0 ns /step/unit for 6000 steps and 408 units
        --spike_emission        35.0 ms --  -- 5.0 us /step for 6000 steps --   -- 14.0 ns /step/unit for 6000 steps and 408 units

[2024-07-03 18:51:45] - PID 124110237631616 - INFO: Output for PerformanceManager <Population 6>
        --evolution     959.0 ms        --  -- 159.0 us /step for 6000 steps -- -- 212.0 ns /step/unit for 6000 steps and 754 units
        --spike_emission        27.0 ms --  -- 4.0 us /step for 6000 steps --   -- 6.0 ns /step/unit for 6000 steps and 754 units

...almost everything doubled. Dispiriting.

@djanloo djanloo added invalid This doesn't seem right technical Concerns the code, not the physics labels Jul 3, 2024
@djanloo
Copy link
Owner Author

djanloo commented Jul 5, 2024

Good news: it does not depend that much on the code.

This is a run of the old code (f80dc35) before the development of the multiscale stuff:

Running network consisting of 14622 neurons for 6000 timesteps
--------------------------------------------------
**************************************************
Simulation took 14 s    (2.3965 ms/step)
        Gathering time avg: 0.0111667 us/step
        Inject time avg: 220.024 us/step
Population evolution stats:
        0:
                evolution:      701.335 us/step ---     116 ns/step/neuron
                spike emission: 52.0272 us/step ---     8 ns/step/neuron
        1:
                evolution:      678.471 us/step ---     113 ns/step/neuron
                spike emission: 57.8057 us/step ---     9 ns/step/neuron
        2:
                evolution:      118.561 us/step ---     282 ns/step/neuron
                spike emission: 15.2028 us/step ---     36 ns/step/neuron
        3:
                evolution:      160.356 us/step ---     205 ns/step/neuron
                spike emission: 27.9058 us/step ---     35 ns/step/neuron
        4:
                evolution:      80.1023 us/step ---     308 ns/step/neuron
                spike emission: 3.85283 us/step ---     14 ns/step/neuron
        5:
                evolution:      99.9843 us/step ---     245 ns/step/neuron
                spike emission: 5.41767 us/step ---     13 ns/step/neuron
        6:
                evolution:      161.416 us/step ---     214 ns/step/neuron
                spike emission: 4.68317 us/step ---     6 ns/step/neuron

Bad news: my ENTIRE machine became slower in six months.

@djanloo
Copy link
Owner Author

djanloo commented Jul 8, 2024

After solving #28, quilt has become a shared library. This is the new performance:

[2024-07-08 16:34:47] - PID 140292477301888 - INFO: Ouput for PerformanceRegistrar (8 managers )
[2024-07-08 16:34:47] - PID 140292477301888 - INFO: Output for PerformanceManager <spiking network>
_________inject _____4.0 s | 774.0 us/step for 6000 steps
_____monitorize ____7.0 ms | __1.0 us/step for 6000 steps
_____simulation ___127.0 s

[2024-07-08 16:34:47] - PID 140292477301888 - INFO: Output for PerformanceManager <Population 0>
______evolution ____45.0 s | __7.0 ms/step for 6000 steps | __1.0 us/step/unit for 6000 steps and 6000 units
_spike_emission _____1.0 s | 193.0 us/step for 6000 steps | _32.0 ns/step/unit for 6000 steps and 6000 units

[2024-07-08 16:34:47] - PID 140292477301888 - INFO: Output for PerformanceManager <Population 1>
______evolution ____44.0 s | __7.0 ms/step for 6000 steps | __1.0 us/step/unit for 6000 steps and 6000 units
_spike_emission _____1.0 s | 236.0 us/step for 6000 steps | _39.0 ns/step/unit for 6000 steps and 6000 units

[2024-07-08 16:34:47] - PID 140292477301888 - INFO: Output for PerformanceManager <Population 2>
______evolution _____5.0 s | 837.0 us/step for 6000 steps | __1.0 us/step/unit for 6000 steps and 420 units
_spike_emission __397.0 ms | _66.0 us/step for 6000 steps | 157.0 ns/step/unit for 6000 steps and 420 units

[2024-07-08 16:34:47] - PID 140292477301888 - INFO: Output for PerformanceManager <Population 3>
______evolution _____7.0 s | __1.0 ms/step for 6000 steps | __1.0 us/step/unit for 6000 steps and 780 units
_spike_emission __685.0 ms | 114.0 us/step for 6000 steps | 146.0 ns/step/unit for 6000 steps and 780 units

[2024-07-08 16:34:47] - PID 140292477301888 - INFO: Output for PerformanceManager <Population 4>
______evolution _____4.0 s | 730.0 us/step for 6000 steps | __2.0 us/step/unit for 6000 steps and 260 units
_spike_emission __101.0 ms | _16.0 us/step for 6000 steps | _65.0 ns/step/unit for 6000 steps and 260 units

[2024-07-08 16:34:47] - PID 140292477301888 - INFO: Output for PerformanceManager <Population 5>
______evolution _____4.0 s | 799.0 us/step for 6000 steps | __1.0 us/step/unit for 6000 steps and 408 units
_spike_emission __137.0 ms | _22.0 us/step for 6000 steps | _56.0 ns/step/unit for 6000 steps and 408 units

[2024-07-08 16:34:47] - PID 140292477301888 - INFO: Output for PerformanceManager <Population 6>
______evolution _____7.0 s | __1.0 ms/step for 6000 steps | __1.0 us/step/unit for 6000 steps and 754 units
_spike_emission ___94.0 ms | _15.0 us/step for 6000 steps | _20.0 ns/step/unit for 6000 steps and 754 units

That is 15 times the original simulation time. This is unacceptable.

djanloo added a commit that referenced this issue Jul 8, 2024
@djanloo
Copy link
Owner Author

djanloo commented Jul 8, 2024

After commit fff7797 :

Running network consisting of 14622 neurons for 6000 timesteps
--------------------------------------------------
**************************************************
[2024-07-08 17:02:24] - PID 140529642497152 - INFO: Ouput for PerformanceRegistrar (8 managers )
[2024-07-08 17:02:24] - PID 140529642497152 - INFO: Output for PerformanceManager <spiking network>
_________inject __895.0 ms | 149.0 us/step for 6000 steps
_____monitorize ___45.0 us | __7.0 ns/step for 6000 steps
_____simulation _____9.0 s

[2024-07-08 17:02:24] - PID 140529642497152 - INFO: Output for PerformanceManager <Population 0>
______evolution _____2.0 s | 420.0 us/step for 6000 steps | _70.0 ns/step/unit for 6000 steps and 6000 units
_spike_emission __240.0 ms | _40.0 us/step for 6000 steps | __6.0 ns/step/unit for 6000 steps and 6000 units

[2024-07-08 17:02:24] - PID 140529642497152 - INFO: Output for PerformanceManager <Population 1>
______evolution _____2.0 s | 408.0 us/step for 6000 steps | _68.0 ns/step/unit for 6000 steps and 6000 units
_spike_emission __257.0 ms | _42.0 us/step for 6000 steps | __7.0 ns/step/unit for 6000 steps and 6000 units

[2024-07-08 17:02:24] - PID 140529642497152 - INFO: Output for PerformanceManager <Population 2>
______evolution __446.0 ms | _74.0 us/step for 6000 steps | 177.0 ns/step/unit for 6000 steps and 420 units
_spike_emission ___61.0 ms | _10.0 us/step for 6000 steps | _24.0 ns/step/unit for 6000 steps and 420 units

[2024-07-08 17:02:24] - PID 140529642497152 - INFO: Output for PerformanceManager <Population 3>
______evolution __608.0 ms | 101.0 us/step for 6000 steps | 130.0 ns/step/unit for 6000 steps and 780 units
_spike_emission __123.0 ms | _20.0 us/step for 6000 steps | _26.0 ns/step/unit for 6000 steps and 780 units

[2024-07-08 17:02:24] - PID 140529642497152 - INFO: Output for PerformanceManager <Population 4>
______evolution __323.0 ms | _53.0 us/step for 6000 steps | 207.0 ns/step/unit for 6000 steps and 260 units
_spike_emission ___16.0 ms | __2.0 us/step for 6000 steps | _10.0 ns/step/unit for 6000 steps and 260 units

[2024-07-08 17:02:24] - PID 140529642497152 - INFO: Output for PerformanceManager <Population 5>
______evolution __386.0 ms | _64.0 us/step for 6000 steps | 157.0 ns/step/unit for 6000 steps and 408 units
_spike_emission ___25.0 ms | __4.0 us/step for 6000 steps | _10.0 ns/step/unit for 6000 steps and 408 units

[2024-07-08 17:02:24] - PID 140529642497152 - INFO: Output for PerformanceManager <Population 6>
______evolution __611.0 ms | 101.0 us/step for 6000 steps | 135.0 ns/step/unit for 6000 steps and 754 units
_spike_emission ___23.0 ms | __3.0 us/step for 6000 steps | __5.0 ns/step/unit for 6000 steps and 754 units

[2024-07-08 17:02:24] - PID 140529642497152 - INFO: Destroyed PerformanceRegistrar at index: 0x7fcf6e6f4cb0

So we are back to a good 70 ns/neuron/step (at least for the striatum) 😏

@djanloo
Copy link
Owner Author

djanloo commented Jul 13, 2024

This is the performance after the discussion in #9:

Running network consisting of 14622 neurons for 6000 timesteps
--------------------------------------------------
**************************************************
[2024-07-13 16:02:38] - PID 127952158987392 - INFO: Ouput for PerformanceRegistrar (8 managers )
[2024-07-13 16:02:38] - PID 127952158987392 - INFO: Output for PerformanceManager <spiking network>
_________inject __812.4 ms | 135.4 us/step for 6000 steps
_____monitorize __448.0 us | _74.0 ns/step for 6000 steps
_____simulation _____6.8 s

[2024-07-13 16:02:38] - PID 127952158987392 - INFO: Output for PerformanceManager <Population 0>
______evolution _____1.6 s | 264.4 us/step for 6000 steps | _44.0 ns/step/unit for 6000 steps and 6000 units
_spike_emission __176.7 ms | _29.5 us/step for 6000 steps | __4.0 ns/step/unit for 6000 steps and 6000 units
_spike_handling __274.7 ms | _45.8 us/step for 6000 steps | __7.0 ns/step/unit for 6000 steps and 6000 units

[2024-07-13 16:02:38] - PID 127952158987392 - INFO: Output for PerformanceManager <Population 1>
______evolution _____1.5 s | 256.3 us/step for 6000 steps | _42.0 ns/step/unit for 6000 steps and 6000 units
_spike_emission __195.9 ms | _32.6 us/step for 6000 steps | __5.0 ns/step/unit for 6000 steps and 6000 units
_spike_handling __265.3 ms | _44.2 us/step for 6000 steps | __7.0 ns/step/unit for 6000 steps and 6000 units

[2024-07-13 16:02:38] - PID 127952158987392 - INFO: Output for PerformanceManager <Population 2>
______evolution __223.1 ms | _37.2 us/step for 6000 steps | _88.0 ns/step/unit for 6000 steps and 420 units
_spike_emission ___56.4 ms | __9.4 us/step for 6000 steps | _22.0 ns/step/unit for 6000 steps and 420 units
_spike_handling ___79.5 ms | _13.3 us/step for 6000 steps | _31.0 ns/step/unit for 6000 steps and 420 units

[2024-07-13 16:02:38] - PID 127952158987392 - INFO: Output for PerformanceManager <Population 3>
______evolution __319.3 ms | _53.2 us/step for 6000 steps | _68.0 ns/step/unit for 6000 steps and 780 units
_spike_emission __104.0 ms | _17.3 us/step for 6000 steps | _22.0 ns/step/unit for 6000 steps and 780 units
_spike_handling ___93.0 ms | _15.5 us/step for 6000 steps | _19.0 ns/step/unit for 6000 steps and 780 units

[2024-07-13 16:02:38] - PID 127952158987392 - INFO: Output for PerformanceManager <Population 4>
______evolution __148.7 ms | _24.8 us/step for 6000 steps | _95.0 ns/step/unit for 6000 steps and 260 units
_spike_emission ___11.3 ms | __1.9 us/step for 6000 steps | __7.0 ns/step/unit for 6000 steps and 260 units
_spike_handling ___75.8 ms | _12.6 us/step for 6000 steps | _48.0 ns/step/unit for 6000 steps and 260 units

[2024-07-13 16:02:38] - PID 127952158987392 - INFO: Output for PerformanceManager <Population 5>
______evolution __196.9 ms | _32.8 us/step for 6000 steps | _80.0 ns/step/unit for 6000 steps and 408 units
_spike_emission ___18.2 ms | __3.0 us/step for 6000 steps | __7.0 ns/step/unit for 6000 steps and 408 units
_spike_handling ___77.8 ms | _13.0 us/step for 6000 steps | _31.0 ns/step/unit for 6000 steps and 408 units

[2024-07-13 16:02:38] - PID 127952158987392 - INFO: Output for PerformanceManager <Population 6>
______evolution __311.8 ms | _52.0 us/step for 6000 steps | _68.0 ns/step/unit for 6000 steps and 754 units
_spike_emission ___16.0 ms | __2.7 us/step for 6000 steps | __3.0 ns/step/unit for 6000 steps and 754 units
_spike_handling __114.0 ms | _19.0 us/step for 6000 steps | _25.0 ns/step/unit for 6000 steps and 754 units

So after:

  • making quilt a shared lib
  • compiling with the appropriate flags
  • introducing thread pools
    I gained a 25% speedup. Consider this issue closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right technical Concerns the code, not the physics
Projects
None yet
Development

No branches or pull requests

1 participant