Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decimal-parameter for reading CSV-files? #11

Closed
roland-KA opened this issue Jul 5, 2024 · 7 comments
Closed

decimal-parameter for reading CSV-files? #11

roland-KA opened this issue Jul 5, 2024 · 7 comments

Comments

@roland-KA
Copy link

Is there a decimal-parameter (as in CSV.jl) to specify if the fractional part of decimal numbers is separated by '.' (like 3.14) or by ',' (like 3,14) when reading CSV files?

@drizk1
Copy link
Member

drizk1 commented Jul 5, 2024

I will check but I'm not sure presently. Do you have a dataset you need to read w that format?

@roland-KA
Copy link
Author

In Germany (and in many other European countries) the comma-separated format is standard. So it's an every-day use-case.

@drizk1
Copy link
Member

drizk1 commented Jul 5, 2024

Ok. If you have an easily accessible small dataset, please pass it along so I can begin testing with it as I figure out what csv.jl offers and how to make tidierfiles support it. If not no worries I'll find one to use

@roland-KA
Copy link
Author

CommaExample.csv

This is an example with three columns A, B and C, where B and C are floating-point numbers using a comma-separator. The column-separator in these files is a semicolon (so that there isn't a confusion with the comma-separator within the floating-point numbers).

@drizk1
Copy link
Member

drizk1 commented Jul 6, 2024

ok, so there is an argument in csv.jl to choose what the decimal is.

read_csv, read_delim and read_tsv do not yet support that additional argument. To keep things consistent with the readr R package, I wil likely make a read_csv2 function that uses "','" as decimal and "'.'" as grouping mark

and then add options to read_delim to give users more control if needed.

@drizk1
Copy link
Member

drizk1 commented Jul 6, 2024

with v.1.2 which i should be on the registry in 20 mins or so
there are two ways to do it

  • read_csv2 defaults to delim = ';' and decimal = ','
  • for more control you can use read_delim as well
julia> read_csv2("path/to/CommaExample.csv")
10×3 DataFrame
 Row │ A        B        C       
     │ String1  Float64  Float64 
─────┼───────────────────────────
   1 │ j           90.0     89.0
   2 │ t           52.0      7.0
   3 │ z           47.0     85.0
   4 │ r           72.0     67.0
   5 │ j           50.0     85.0
   6 │ x            5.0     29.0
   7 │ a           84.0      7.0
   8 │ t           38.0     28.0
   9 │ w           96.0     42.0
  10 │ a           12.0     62.0

julia> read_delim("path/to/CommaExample.csv", decimal = ',', delim = ';')
10×3 DataFrame
 Row │ A        B        C       
     │ String1  Float64  Float64 
─────┼───────────────────────────
   1 │ j           90.0     89.0
   2 │ t           52.0      7.0
   3 │ z           47.0     85.0
   4 │ r           72.0     67.0
   5 │ j           50.0     85.0
   6 │ x            5.0     29.0
   7 │ a           84.0      7.0
   8 │ t           38.0     28.0
   9 │ w           96.0     42.0
  10 │ a           12.0     62.0

please reopen/let us know if you run into issues / want additional features

@drizk1 drizk1 closed this as completed Jul 6, 2024
@roland-KA
Copy link
Author

Wow, that went fast! Thank you very much 😊👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants