-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When coercing columns to strings, boolean cells turn into null #250
Comments
Hi @severinh and thanks for your kind words |
Btw @lukapeschke and myself tend to prefer the |
@PrettyWood Thanks for the quick response!.
Sure! Makes sense to be more explicit. The main thing I care about is that we don't lose legitimate data.
Tried that, and unfortunately that did not seem to be enough yet. The output was still PS: Another problem we ran into: A cell like |
Just made a quick fix @severinh |
Amazing! Thank you. I've opened #252 for the issue with the date coercion. |
In our project, we have Excel files with mixed data types. Because of this, we need to coerce all columns to strings, and interpret the data downstream. We're currently blocked from adopting
fastexcel
because it does not coerce booleans as expected.How to reproduce
Suppose you have an Excel file with the following data:
=TRUE
=FALSE
"some string"
Now lets read this Excel file into a Polars dataframe, while coercing the column to strings:
excel_reader.load_sheet(0, header_row=1, dtypes={"Header": "string"}).to_polars()
This produces the following data frame:
null
null
"some string"
Expected behavior
What I would have expected fastexcel to coerce the boolean values to
"0"
and"1"
instead of losing them. That is:"1"
"0"
"some string"
Test case
I've cloned the fastexcel repo and wrote a new unit test for this, which currently fails.
Excel sheet: sheet-bool.xlsx
Unfortunately, I'm not experienced with Rust, so I did not yet figure out where/how to make the change in the Rust code to make the test pass.
Closing words
First of all, big thanks for building
fastexcel
! I'm eager to migrate to fastexcel due to its ability to directly output Polars, which our downstream pipelines are based on, which will give us a massive performance boost.It would be much appreciated if you either gave me some pointers on where/how to make this change in Rust code (so I can open a PR), or made the fix yourself.
The text was updated successfully, but these errors were encountered: