Skip to content

Playbook for AWS Lambda, AWS S3 (e.g. csv files), and pandas.

Notifications You must be signed in to change notification settings

pkgit123/aws-s3-to-pandas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

aws-s3-to-pandas

Playbook for AWS Lambda, AWS S3 (e.g. csv files), and pandas.

Pre-processing code for longitudinal study of options data by ticker. Uses Python and AWS Lambda.

AWS Lambda code to split-and-shuffle options data (end-of-day) by ticker. Learned that pandas groupby objects can be convert into list of tuples (key, dataframe).

Similar concept as split-map-shuffle-reduce paradigm used in "big data" analysis.

Steps:

  1. Download CSV files from AWS S3
  2. Read CSV into pandas dataframe
  3. Split the dataframe into list-of-tuples (str_ticker, df_by_ticker)
  4. Upload each tuple-dataframe up to AWS S3 by ticker

Further research:

  1. Extend groupby into Type (call/put)
  2. Extend groupby into Expiration Date
  3. Extend groupby into Strike Price
  4. Look for opportunities to "map" after "split"
  5. Look for opportunities to "reduce" after "shuffle"

About

Playbook for AWS Lambda, AWS S3 (e.g. csv files), and pandas.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages