speedup fix - avoid creating new array everytime #145

jucas1 · 2021-07-28T15:08:04Z

fix for performance bottleneck for larger fat tables.

decalage2 · 2021-07-28T20:55:39Z

Hi @jucas1, thanks a lot for this PR. Do you have a sample where it makes a big difference? I would just like to test the improvement.

jucas1 · 2021-07-29T10:45:46Z

@decalage2
Hello,
Unfortunately, I can't publish my test files (production CAD files). They are SolidWorks data format type (.sldprt, .sldasm). Speedup will take effect in any OLE container with large directories (In some cases .sld* dirs have hundreds to thousands items). I made simple benchmark:

from array import array
a = array("Q", range(100))

%%timeit
biga = a[:]
for _ in range(200):
    biga = biga + a
336 µs ± 6.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
biga = a[:]
for _ in range(200):
    biga.extend(a)
17 µs ± 301 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit
biga = a[:]
for _ in range(10000):
    biga = biga + a
3.5 s ± 43.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
biga = a[:]
for _ in range(10000):
    biga.extend(a)
1.31 ms ± 29.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Concatenating arrays using "+" operator have exponential cost, while .extend cost is almost linear. In my use-case that did 5-10x speedup (for batch processing).

Also "+=" operator will make effect, and surprisingly is even slightly faster than .extend:

%%timeit
biga = a[:]
for _ in range(10000):
    biga += a
1.12 ms ± 3.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Approx. 20% speedup

# TODO: would it be more efficient using a dict or hash values, instead # of a list of long ?

ztravis · 2022-12-08T18:07:53Z

+1 for this change, storing the used stream indexes in a list makes parsing quadratic in the number of streams; changing to use a set is a very easy fix.

speedup fix - avoid creating new array everytime

db634d3

fix for performance bottleneck for larger fat tables.

decalage2 self-requested a review July 28, 2021 20:52

decalage2 self-assigned this Jul 28, 2021

decalage2 added the enhancement label Jul 28, 2021

decalage2 added this to the olefile 0.47 milestone Jul 28, 2021

jucas1 added 3 commits July 29, 2021 16:36

Avoid logging string formatting outside log level.

2533960

Approx. 20% speedup

Structs objects in classes & remove unpack in debug

82bd54e

use sets instead lists for used streams fats

4d5c329

# TODO: would it be more efficient using a dict or hash values, instead # of a list of long ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speedup fix - avoid creating new array everytime #145

speedup fix - avoid creating new array everytime #145

jucas1 commented Jul 28, 2021

decalage2 commented Jul 28, 2021

jucas1 commented Jul 29, 2021

ztravis commented Dec 8, 2022 •

edited

Loading

speedup fix - avoid creating new array everytime #145

Are you sure you want to change the base?

speedup fix - avoid creating new array everytime #145

Conversation

jucas1 commented Jul 28, 2021

decalage2 commented Jul 28, 2021

jucas1 commented Jul 29, 2021

ztravis commented Dec 8, 2022 • edited Loading

ztravis commented Dec 8, 2022 •

edited

Loading