Playbook for AWS Lambda, AWS S3 (e.g. csv files), and pandas.
Pre-processing code for longitudinal study of options data by ticker. Uses Python and AWS Lambda.
AWS Lambda code to split-and-shuffle options data (end-of-day) by ticker. Learned that pandas groupby objects can be convert into list of tuples (key, dataframe).
Similar concept as split-map-shuffle-reduce paradigm used in "big data" analysis.
Steps:
- Download CSV files from AWS S3
- Read CSV into pandas dataframe
- Split the dataframe into list-of-tuples (str_ticker, df_by_ticker)
- Upload each tuple-dataframe up to AWS S3 by ticker
Further research:
- Extend groupby into Type (call/put)
- Extend groupby into Expiration Date
- Extend groupby into Strike Price
- Look for opportunities to "map" after "split"
- Look for opportunities to "reduce" after "shuffle"