Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a way to put AMPCamp dataset to cluster HDFS #3

Open
irifed opened this issue Nov 12, 2014 · 0 comments
Open

Find a way to put AMPCamp dataset to cluster HDFS #3

irifed opened this issue Nov 12, 2014 · 0 comments

Comments

@irifed
Copy link
Owner

irifed commented Nov 12, 2014

AMPCamp big data mini course uses ~20GB dataset which is stored on S3.
This dataset has to be downloaded to master's local disk and then put to cluster HDFS.

Downloading from S3 to local disk of virtual instance on SL takes ~20 min.
Downloading same dataset from SL Object Storage takes ~6 min, but there is a problem: on SL Object Store it is not possible to make object public (as it is possible in S3).
It is possible though to enable CDN on objects stored in SL and they can be downloaded via HTTP url. However, in this case dataset should be archived and stored as bulk file instead of multiple separate files, as it was originally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant