Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of NAs #1

Open
piccolbo opened this issue Oct 20, 2014 · 1 comment
Open

Handling of NAs #1

piccolbo opened this issue Oct 20, 2014 · 1 comment

Comments

@piccolbo
Copy link
Collaborator

Hi,
I have a data frame containing some NAs in one colum, When I write it out and read it back in, that column contains only NAs. What gives?

test case

df = 
structure(list(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), y = structure(1:10, .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "factor"), 
    fac = structure(c(1L, NA, NA, 2L, 2L, 1L, NA, 2L, 1L, 1L), .Label = c("b", 
    "c"), class = "factor")), .Names = c("x", "y", "fac"), row.names = c(NA, 
-10L), class = "data.frame")

write.avro(df, "/tmp/testxx.avro")
read.avro("/tmp/testxx.avro")
   x  y  fac
1  1  1 <NA>
2  1  2 <NA>
3  1  3 <NA>
4  1  4 <NA>
5  1  5 <NA>
6  1  6 <NA>
7  1  7 <NA>
8  1  8 <NA>
9  1  9 <NA>
10 1 10 <NA>

Thanks

@jamiefolson
Copy link
Contributor

That is a very good question. An NA factor likely require serialization as
a union of an enum with null, which may be either serialized or serialized
incorrectly. I made some simplifying assumptions that were sufficient at
the time, but we may need to extend the logic around unions.
On Oct 20, 2014 7:24 PM, "Antonio Piccolboni" [email protected]
wrote:

Hi,
I have a data frame containing some NAs in one colum, When I write it out
and read it back in, that column contains only NAs. What gives?

test case

df =
structure(list(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), y = structure(1:10, .Label = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "factor"),
fac = structure(c(1L, NA, NA, 2L, 2L, 1L, NA, 2L, 1L, 1L), .Label = c("b",
"c"), class = "factor")), .Names = c("x", "y", "fac"), row.names = c(NA,
-10L), class = "data.frame")

write.avro(df, "/tmp/testxx.avro")
read.avro("/tmp/testxx.avro")


Reply to this email directly or view it on GitHub
#1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants