Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParseError with single-line files stored as io.BytesIO #370

Open
william-watson-swri opened this issue Sep 26, 2024 · 0 comments
Open

ParseError with single-line files stored as io.BytesIO #370

william-watson-swri opened this issue Sep 26, 2024 · 0 comments

Comments

@william-watson-swri
Copy link

Version
pymzml: 2.5.10
Python: 3.11.7

Description
I'm receiving mzML files as bytes, wrapping these in io.BytesIO, and then passing that to pymzml.run.Reader:

reader = pymzml.run.Reader(io.BytesIO(mzml_bytes))

This sometimes raises the following exception:

ParseError: no element found: line 1, column 0

Why
Some of the mzML files I'm using do not have line breaks - i.e. they are all on a single line, and the _guess_encoding function breaks these. Looking at the pymzml source, the io.BytesIO objects travel through this line, which in turn calls the culprit, _guess_encoding:

match = regex_patterns.FILE_ENCODING_PATTERN.search(mzml_file.readline())

After the .readline(), there's no data left in the BytesIO if the file has no line breaks, and thus the later XML parsing fails.

Workaround/fix
I'm current inserting a line break at the start of the XML data before passing it to pymzml:

data = re.sub(br'(<\?xml[^>]+>)', br'\1\n', mzml_bytes, count=1)

I believe this could also be fixed by just adding mzml_file.seek(0) after the offending line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant