Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create dataframe containing null values #20

Closed
1 of 2 tasks
forgetso opened this issue Nov 16, 2022 · 6 comments · Fixed by #118
Closed
1 of 2 tasks

Cannot create dataframe containing null values #20

forgetso opened this issue Nov 16, 2022 · 6 comments · Fixed by #118
Labels
bug Something isn't working

Comments

@forgetso
Copy link

forgetso commented Nov 16, 2022

Have you tried latest version of polars?

  • yes
  • no

What version of polars are you using?

0.6.0

What operating system are you using polars on?

Linux 5.4.0-132-generic #148-Ubuntu

What node version are you using

node --version
v16.13.2

Describe your bug.

Try to create a dataframe with a value that is null results in an unwrap on a None value in rust and a panic.

What are the steps to reproduce the behavior?

Try to create a dataframe with a null value

What is the actual behavior?

node                              
Welcome to Node.js v16.13.2.
Type ".help" for more information.
> pl = require('nodejs-polars')

> pl.DataFrame([{a:1, b:2, c:null}])
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src/dataframe.rs:1548:29

What is the expected behavior?

Polars should have created the dataframe with a null value for column c

> pl.DataFrame([{a:1, b:2, c:null}])
Proxy [
  shape: (1, 3)
  ┌─────┬─────┬─────┐
  │ a   ┆ b   ┆ c   │
  │ --- ┆ --- ┆ --- │
  │ f64 ┆ f64 ┆ f64 │
  ╞═════╪═════╪═════╡
  │ 1.0 ┆ 2.0 ┆ null │
  └─────┴─────┴─────┘,
  {
    get: [Function: get],
    set: [Function: set],
    has: [Function: has],
    ownKeys: [Function: ownKeys],
    getOwnPropertyDescriptor: [Function: getOwnPropertyDescriptor]
  }
]
@forgetso forgetso added the bug Something isn't working label Nov 16, 2022
@johanroelofsen
Copy link
Contributor

While looking into this one, I noticed that providing arrays into the object:

const lf = pl.DataFrame({ a: [1], b: [2], c: [null] });
console.log(lf);

returns

shape: (1, 3)
┌─────┬─────┬──────┐
│ a   ┆ b   ┆ c    │
│ --- ┆ --- ┆ ---  │
│ f64 ┆ f64 ┆ f64  │
╞═════╪═════╪══════╡
│ 1.0 ┆ 2.0 ┆ null │
└─────┴─────┴──────┘

Because of this notice of the documentation:

Interface DataFrame

A DataFrame is a two-dimensional data structure that represents data as a table with rows and columns.
Param

Object, Array, or Series Two-dimensional data in various forms. object must contain Arrays. Array may contain Series or other Arrays.

I am a little bit hesistant whether I should further check this?

@forgetso
Copy link
Author

It's probably not a high priority ticket as you've demonstrated a workaround. I am coming from pandas where the other syntax does work.

>>> pd.DataFrame([{'a':1, 'b':2, 'c':None}])
   a  b     c
0  1  2  None

@Bidek56
Copy link
Collaborator

Bidek56 commented Apr 6, 2023

I have a created a PR for this issue.

@Bidek56
Copy link
Collaborator

Bidek56 commented Apr 6, 2023

But I do see a problem after my PR.
pl.DataFrame([{a:1, b:2, c:null}]) will omit column c, unless one of the rows has a non-null value.

@universalmind303
Copy link
Collaborator

But I do see a problem after my PR.
pl.DataFrame([{a:1, b:2, c:null}]) will omit column c, unless one of the rows has a non-null value.

I believe this is a known issue in the polars json parser. which the JS parser is mostly based off of.

pola-rs/polars#7858

@daolanfler
Copy link

for those who want a temporary solution, you can check if such column in the df.columns array. then use with_columns to add a column full of null values

const data3 = [
  { col1: "b", col2: "d", str: "1 -> 1 -> 2", time: 1691565256703, col3: null },
  { col1: "b", col2: "c", str: "2 -> 2 -> 1", col3: null },
  { col1: "c", col2: "a", str: "3 -> 3 -> 3", col3: null },
];

let df3 = pl.DataFrame(data3 );
if (!df3.columns.includes('col3')) {
    df3 = df3.withColumn(pl.lit(null).alias("col3"));
}

┌──────┬──────┬─────────────┬───────────┬──────┐
 col1  col2  str          time       col3 
 ---   ---   ---          ---        ---  
 str   str   str          f64        null 
╞══════╪══════╪═════════════╪═══════════╪══════╡
 b     d     1 -> 1 -> 2  1.6916e12  null 
 b     c     2 -> 2 -> 1  null       null 
 c     a     3 -> 3 -> 3  null       null 
└──────┴──────┴─────────────┴───────────┴──────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants