You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's very frustrating that the zstd compression algorithm is not supported in the managed AWS Lambda layer. This is a generally well supported compression algorithm used in parquet and the lack of support prevents lambda workloads from being able to work with parquet sources or writing out to zstd.parquet.
I've tried building out lambda function that directly contain the whole awswrangler library, but run into issues where the total size is greater than Lambda will allow, and running in roadblocks left and right. My source dataset is in zstd.parquet, and I'm attempting to use AWS Batch Ops triggering a lambda function to read in, make just a couple changes, and then write back out. In the past with other datasets, this methodology has worked great and was cost effective over large datasets, but just can't get this to work. Has anyone else had any experience or luck here or can point me in a new direction?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
It's very frustrating that the zstd compression algorithm is not supported in the managed AWS Lambda layer. This is a generally well supported compression algorithm used in parquet and the lack of support prevents lambda workloads from being able to work with parquet sources or writing out to zstd.parquet.
I've tried building out lambda function that directly contain the whole awswrangler library, but run into issues where the total size is greater than Lambda will allow, and running in roadblocks left and right. My source dataset is in zstd.parquet, and I'm attempting to use AWS Batch Ops triggering a lambda function to read in, make just a couple changes, and then write back out. In the past with other datasets, this methodology has worked great and was cost effective over large datasets, but just can't get this to work. Has anyone else had any experience or luck here or can point me in a new direction?
Beta Was this translation helpful? Give feedback.
All reactions