Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to read multiple files? #13

Closed
eitsupi opened this issue Jul 11, 2024 · 7 comments
Closed

How to read multiple files? #13

eitsupi opened this issue Jul 11, 2024 · 7 comments

Comments

@eitsupi
Copy link

eitsupi commented Jul 11, 2024

It would be great if the syntax for reading multiple files as a single DataFrame in place.

@drizk1
Copy link
Member

drizk1 commented Jul 11, 2024

To make sure I understand, would the goal be to give multiple file paths and then they are all appended/stacked into the same dataframe?

@eitsupi
Copy link
Author

eitsupi commented Jul 11, 2024

I want something like these:
https://duckdb.org/docs/data/multiple_files/overview.html

I gave up using Julia for today because there didn't seem to be an easy way to do this (of course duckdb would do it, but it didn't seem like there was any point in using Julia if using duckdb)

@drizk1
Copy link
Member

drizk1 commented Jul 11, 2024

We can definitely get something like the below in place for you

read_csv(['flights1.csv', 'flights2.csv'], arguments = here);

@drizk1
Copy link
Member

drizk1 commented Jul 11, 2024

As I get that added for you hopefully within the week, the only other thing I'll mention is that if you'd like you can use TidierDB which give you TidierData and Tidier Syntax on duckdb

@eitsupi
Copy link
Author

eitsupi commented Jul 12, 2024

I'll mention is that if you'd like you can use TidierDB which give you TidierData and Tidier Syntax on duckdb

Thank you. Of course I tried installing it, but it seemed like too many dependencies to use for the duckdb read function only.

Perhaps it would be better to use package extensions and not make it the default dependency except for dependencies that are used by any backend?

@drizk1
Copy link
Member

drizk1 commented Jul 12, 2024

Moving to package extensions is definitely something I want to do to lighten up the dep, I just haven't found the time to do yet unfortunately.

Will definitely get multiple file paths up for you soon tho

@drizk1
Copy link
Member

drizk1 commented Jul 12, 2024

Alright, v.1.3 will be headed to the registry shortly (~30mins). It supports reading multiple files at once when passed as a vector for
-read_csv
-read_csv2
-read_delim
-read_tsv
-read_parquet
here is an example. please let us know if there are other issues/feature you would like addressed

path ="https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv";
read_csv([path, path], col_select = [:model, :mpg, :hp], skip = 10)
44×3 DataFrame
 Row │ model                mpg      hp    
     │ String31             Float64  Int64 
─────┼─────────────────────────────────────
   1 │ Merc 280C               17.8    123
   2 │ Merc 450SE              16.4    180
   3 │ Merc 450SL              17.3    180
   4 │ Merc 450SLC             15.2    180
   5 │ Cadillac Fleetwood      10.4    205
   6 │ Lincoln Continental     10.4    215
   7 │ Chrysler Imperial       14.7    230
  ⋮  │          ⋮              ⋮       ⋮
  38 │ Fiat X1-9               27.3     66
  39 │ Porsche 914-2           26.0     91
  40 │ Lotus Europa            30.4    113
  41 │ Ford Pantera L          15.8    264
  42 │ Ferrari Dino            19.7    175
  43 │ Maserati Bora           15.0    335
  44 │ Volvo 142E              21.4    109
                            30 rows omitted

@drizk1 drizk1 closed this as completed Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants