Find a way to put AMPCamp dataset to cluster HDFS #3

irifed · 2014-11-12T20:35:35Z

AMPCamp big data mini course uses ~20GB dataset which is stored on S3.
This dataset has to be downloaded to master's local disk and then put to cluster HDFS.

Downloading from S3 to local disk of virtual instance on SL takes ~20 min.
Downloading same dataset from SL Object Storage takes ~6 min, but there is a problem: on SL Object Store it is not possible to make object public (as it is possible in S3).
It is possible though to enable CDN on objects stored in SL and they can be downloaded via HTTP url. However, in this case dataset should be archived and stored as bulk file instead of multiple separate files, as it was originally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find a way to put AMPCamp dataset to cluster HDFS #3

Find a way to put AMPCamp dataset to cluster HDFS #3

irifed commented Nov 12, 2014

Find a way to put AMPCamp dataset to cluster HDFS #3

Find a way to put AMPCamp dataset to cluster HDFS #3

Comments

irifed commented Nov 12, 2014