Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speedup fix - avoid creating new array everytime #145

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jucas1
Copy link

@jucas1 jucas1 commented Jul 28, 2021

fix for performance bottleneck for larger fat tables.

fix for performance bottleneck for larger fat tables.
@decalage2 decalage2 self-requested a review July 28, 2021 20:52
@decalage2 decalage2 self-assigned this Jul 28, 2021
@decalage2 decalage2 added this to the olefile 0.47 milestone Jul 28, 2021
@decalage2
Copy link
Owner

Hi @jucas1, thanks a lot for this PR. Do you have a sample where it makes a big difference? I would just like to test the improvement.

@jucas1
Copy link
Author

jucas1 commented Jul 29, 2021

@decalage2
Hello,
Unfortunately, I can't publish my test files (production CAD files). They are SolidWorks data format type (.sldprt, .sldasm). Speedup will take effect in any OLE container with large directories (In some cases .sld* dirs have hundreds to thousands items). I made simple benchmark:

from array import array
a = array("Q", range(100))

%%timeit
biga = a[:]
for _ in range(200):
    biga = biga + a
336 µs ± 6.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
biga = a[:]
for _ in range(200):
    biga.extend(a)
17 µs ± 301 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit
biga = a[:]
for _ in range(10000):
    biga = biga + a
3.5 s ± 43.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
biga = a[:]
for _ in range(10000):
    biga.extend(a)
1.31 ms ± 29.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Concatenating arrays using "+" operator have exponential cost, while .extend cost is almost linear. In my use-case that did 5-10x speedup (for batch processing).

Also "+=" operator will make effect, and surprisingly is even slightly faster than .extend:

%%timeit
biga = a[:]
for _ in range(10000):
    biga += a
1.12 ms ± 3.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

jucas1 added 3 commits July 29, 2021 16:36
        # TODO: would it be more efficient using a dict or hash values, instead
        #      of a list of long ?
@ztravis
Copy link

ztravis commented Dec 8, 2022

+1 for this change, storing the used stream indexes in a list makes parsing quadratic in the number of streams; changing to use a set is a very easy fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants