Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readall() raises Bad7zFile: CRC32 error #359

Open
DoNck opened this issue Sep 14, 2021 · 13 comments
Open

readall() raises Bad7zFile: CRC32 error #359

DoNck opened this issue Sep 14, 2021 · 13 comments
Labels
bug Something isn't working for archiving Issue on archiving, compression or encryption for extraction Issue on extraction, decompression or decryption no-issue-activity

Comments

@DoNck
Copy link
Contributor

DoNck commented Sep 14, 2021

Describe the bug
readall() raises Bad7zFile: CRC32 error

To Reproduce
Steps to reproduce the behavior:

  1. download and unzip tests.zip
  2. cd to the tests directory
  3. run pip install py7zr
  4. run: python python_7z.py ok.7z
    script should run fine and list the content found in the provided nested archive:
  5. run python python_7z.py ko.7z
    script fails instead of listing content:
$ python python_7z.py ko.7z
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 604, in _extract
    self.worker.extract(
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 1183, in extract
    self.extract_single(
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 1276, in extract_single
    raise e
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 1264, in extract_single
    raise CrcError("{}".format(f.filename))
py7zr.exceptions.CrcError: EventConsumer.txt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "python_7z.py", line 19, in <module>
    list_archive(top_archive_name, top_archive)
  File "python_7z.py", line 11, in list_archive
    for filename, file_content in archive.readall().items():
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 940, in readall
    return self._extract(path=None, return_dict=True)
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 609, in _extract
    raise Bad7zFile("CRC32 error on archived file {}.".format(str(ce)))
py7zr.exceptions.Bad7zFile: CRC32 error on archived file EventConsumer.txt.

Expected behavior
List (nested) archive(s) content recursively.

Environment (please complete the following information):

  • OS: Windows 10
  • Python 3.8.8,
  • py7zr version: 0.16.1, installed by pip

Test data(please attach in the report):
ok.7z and ko.7z are attached within the zip file, along the python script (7z not allowed on github uploads).

Additional context
Both sample archives extract fine from 7z-FM 19.00 (x86).

miurahr added a commit that referenced this issue Sep 14, 2021
miurahr added a commit that referenced this issue Sep 16, 2021
@miurahr
Copy link
Owner

miurahr commented Sep 17, 2021

ko.7z has following header properties;

emptyfiles  = [True, True, True]
files = [
  {'emptystream': False, 'filename': 'Config.xml', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': True, 'filename': 'EventConsumer.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': False, 'filename': 'EventConsumer.txt', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}, 
  {'emptystream': False, 'filename': 'processes1.csv', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': True, 'filename': 'processes1.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': False, 'filename': 'processes2.csv', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': True, 'filename': 'processes2.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}
]
packsizes = [3100, 1, 21160, 1, 18794, 1]
unpacksizes = [28360, 0, 1068, 136588, 0, 124443, 0]
num_unpackstream_folders = [1, 0, 2, 0, 1, 0]
digests = [3531454146, 1149430100, 529556584, 1488982218]

@miurahr
Copy link
Owner

miurahr commented Sep 17, 2021

When looking into 'EventConsumer.log' property, it is weird...

A following values means the file has a stream of 1 byte but extracted file size is zero

packsize = 1
unpacksize = 0
num_unpackstream_folders = 0

but definition say, there is no stream of packed data.

emptystream = True

These are contradicted.

@miurahr
Copy link
Owner

miurahr commented Sep 17, 2021

The file entries by 7z command is

Blocks = 6

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2021-09-08 23:08:33 .....        28360         3100  Config.xml
2021-09-08 23:08:33 .....            0            0  EventConsumer.log
2021-09-08 23:08:33 .....         1068        21160  EventConsumer.txt
2021-09-08 23:08:33 .....       136588               processes1.csv
2021-09-08 23:08:33 .....            0            0  processes1.log
2021-09-08 23:08:33 .....       124443        18794  processes2.csv
2021-09-08 23:08:33 .....            0            0  processes2.log
------------------- ----- ------------ ------------  ------------------------
2021-09-08 23:08:33             290459        43054  7 files

@miurahr
Copy link
Owner

miurahr commented Sep 19, 2021

@DoNck How do you made data you produce the issue?
It is the actually bug both producer and extractor.

@miurahr miurahr added bug Something isn't working for archiving Issue on archiving, compression or encryption for extraction Issue on extraction, decompression or decryption labels Sep 19, 2021
@DoNck
Copy link
Contributor Author

DoNck commented Sep 20, 2021

Hi @miurahr, thank you for your quick support.
This archive is produced by this tool.
I don't know if you tried, but I forgot to mention: Both provided archives extract fine from 7z CLI and GUI tools.
Could py7zr lib offer the same support despite something being wrong in the archive producer side ?

@sc-anssi
Copy link

Hi all !
The CRCError is raised on the EventConsumer.txt file (non-empty), not EventConsumer.log (empty). I don't know if this changes anything, but I was just checking we were on the same page.

ko.7z has following header properties;

emptyfiles  = [True, True, True]
files = [
  {'emptystream': False, 'filename': 'Config.xml', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': True, 'filename': 'EventConsumer.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': False, 'filename': 'EventConsumer.txt', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}, 
  {'emptystream': False, 'filename': 'processes1.csv', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': True, 'filename': 'processes1.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': False, 'filename': 'processes2.csv', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': True, 'filename': 'processes2.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}
]
packsizes = [3100, 1, 21160, 1, 18794, 1]
unpacksizes = [28360, 0, 1068, 136588, 0, 124443, 0]
num_unpackstream_folders = [1, 0, 2, 0, 1, 0]
digests = [3531454146, 1149430100, 529556584, 1488982218]

From the above output, the CRC32 are valid for all the non-empty files, but they are missing for empty files (4 digests for 7 seven files)
When I run the code with the exception raising patched to print some info instead (in

raise CrcError("{}".format(f.filename))
), this is what I get:

CRCError ! EventConsumer.txt: expected=1149430100, got=0
CRCError ! processes1.csv: expected=529556584, got=1149430100
CRCError ! processes2.csv: expected=1488982218, got=529556584
...

The CRC32 are still valid but offsetted.

We will be investigating our usage of the 7z library in DFIR-ORC/dfir-orc#49, but maybe there is something we are (both) missing in the handling of empty files and their CRC32 ? What do you think @miurahr ?

@DoNck
Copy link
Contributor Author

DoNck commented Oct 5, 2021

Hi, are there any update regarding this situation ?

Best regards

@miurahr
Copy link
Owner

miurahr commented Oct 7, 2021

This is very corner case and difficult to analyze.
It is still in investigation.

@DoNck If you have any findings, comments are welcome!

@sc-anssi
Copy link

Hi,
We fixed (DFIR-ORC/dfir-orc@7d8bf43) the handling of empty streams added to an archive to match what is done for empty files. However, 7z does not really specify that it should be handled one way or the other, so it might be interesting for your implementation in py7zr to handle both cases as the 7z CLI seems to do.
Regards

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working for archiving Issue on archiving, compression or encryption for extraction Issue on extraction, decompression or decryption no-issue-activity
Projects
None yet
Development

No branches or pull requests

3 participants