Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_excel reads big numbers incorrectly when they use a cell reference #19169

Closed
2 tasks done
breanna-gream opened this issue Oct 10, 2024 · 4 comments
Closed
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars upstream issue

Comments

@breanna-gream
Copy link

breanna-gream commented Oct 10, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df1 = pl.read_excel("test.xlsx", sheet_name="Sheet1")
df2 = pl.read_excel("test.xlsx", sheet_name="Sheet2")

df1.item(), df2.item()

# Returns
# (-8086931554011838357, -8086931554011838464)

The contents of test.xlsx (also attached below) are:

col1
'-8086931554011838357

in Sheet 1 (note the apostrophe to store this number as text). And then Sheet2 looks the same, but the number references Sheet1!A2 instead of being a static value.

test.xlsx

Log output

dataframe filtered
dataframe filtered

Issue description

I think this might be an issue with the calamine engine? Using the xlsx2csv engine seems to fix it, however I would prefer not to use it as I have found that calamine handles floating point precision better.
Apologies if this was not the correct place to raise this - please let me know.

Expected behavior

Returns correct number even if the cell uses a formula instead of a static value. I.e. output would be:

(-8086931554011838357, -8086931554011838357)

Installed versions

--------Version info---------
Polars:              1.9.0
Index type:          UInt32
Platform:            Windows-10-10.0.19045-SP0
Python:              3.11.9 (tags/v3.11.9:de54cf5, Apr  2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            0.11.6
fsspec               <not installed>
gevent               <not installed>
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         <not installed>
numpy                2.1.1
openpyxl             <not installed>
pandas               <not installed>
pyarrow              17.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             0.8.3
xlsxwriter           <not installed>
@breanna-gream breanna-gream added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Oct 10, 2024
@PrettyWood
Copy link

PrettyWood commented Oct 14, 2024

@breanna-gream It indeed comes from calamine (from https://github.com/tafia/calamine/blob/master/src/xlsx/cells_reader.rs#L372 to be precise)
I don't have enough xml knowledge to understand exactly why this logic is here. Probably best to open an issue on calamine side

EDIT: I opened tafia/calamine#472

@breanna-gream
Copy link
Author

@PrettyWood looks like your PR tafia/calamine#472 has been merged (thank you for creating that!)

However I have tested again based on my original example (after re-installing polars[calamine]) and am still seeing the same issue - is there something else I need to do to see this fix flow through?

@PrettyWood
Copy link

We need to wait for a new calamine release. Once it's out we'll bump calamine in fastexcel and create a new fastexcel release.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jan 9, 2025

FYI: closing this as it's not a Polars-specific issue (fix will come from upstream).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars upstream issue
Projects
None yet
Development

No branches or pull requests

3 participants