-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read/readall dumps the decompressed files to memory, instead of streaming them #579
Comments
There is a one of main loop in
With this structure, py7zr originally extract files into file system, and @Zoynels contribute a memory IO feature as #111 |
Face the same problem of OOM for large 7zip file. Hope for improvement! |
There is an idea to extend I would like to try as topic branch, and I expect @itanfeng @starkapandan @jaboja as a tester and reviewer of the change. It can be compatibility breaking change, and I want to deprecate the old |
There is a problem with reading large files, whose decompressed form exceed the available RAM:
The library (namely read/readall methods) tries to first decompress the file to memory using BytesIO, and then returns that BytesIO object. While that may work well for small files, it fails due to lack of memory, for bigger ones.
It would be better if the library streamed the files, just like the standard file IO.
To Reproduce
Expected behavior
Library should allocate only as much memory as really needed for reading data requested, and allow to stream files even if their decompressed form exceeds available memory and disk space.
Environment:
(the Wikipedia dump file used as an example is 246.6 MB in compressed form, and 36 GB when decompressed)
The text was updated successfully, but these errors were encountered: