df_io

Python helpers for doing IO with Pandas DataFrames

Available methods

This method supports:

Write a Pandas DataFrame (df) to an S3 path in CSV format (the default):

import df_io

df_io.write_df(df, 's3://bucket/dir/mydata.csv')

The same with gzip compression:

df_io.write_df(df, 's3://bucket/dir/mydata.csv.gz')

With zstandard compression using pickle:

df_io.write_df(df, 's3://bucket/dir/mydata.pickle.zstd', fmt='pickle')

Using JSON lines:

df_io.write_df(df, 's3://bucket/dir/mydata.json.gz', fmt='json')

Passing writer parameters:

df_io.write_df(df, 's3://bucket/dir/mydata.json.gz', fmt='json', writer_options={'lines': False})

Chunked write (splitting the df into equally sized parts and creating/writing outputs for them):

df_io.write_df(df, 's3://bucket/dir/mydata.json.gz', fmt='json', chunksize=10000)