Data Transfer between Clouds using rclone #3605
beepdot
started this conversation in
KnowledgeBase
Replies: 2 comments 2 replies
-
Data copy runs on OCI
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Rclone Configurations
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello community, in this post I put some some information on rclone and data transfer strategy between clouds
Data Copy
We are moving our data store from Azure blob storage to AWS S3 buckets. We have lot of data and files to be moved over (around 2000TB of data with millions of files). I have tried
rlcone
to copy a sample set of data from Azure storage to AWS S3 and it worked well. Here is some results -It took around 5 hours to copy approximately 1.26 TB of data from Azure storage to S3. Also we had 55,26,697 objects that required to be copied over.
I did another run with some updated numbers which resulted in slightly better copy duration
Around same 1.26 TB of data with a little more than 55,26,697 objects took 3 hours 20 minutes with the updated parameters.
I ran another run with some more updated numbers
Time to copy down to 2 hours 58m
I did one more run with some more updated numbers on a 8 core and 32GB machine (these numbers take more memory, around 26GB, so we need to increase the VM size)
The time to transfer reduced to just 1 hour and 40m
I did one last run with some more updated numbers on is the 8 core and 32GB machine
In above run load avg crossed 8 and reached around 12 (on a 8 core VM), due to which the time might have increased.
Rerun of above with an upgraded VM (16 Core and 64GB Ram)
From this point onwards, I started to see S3 errors while copying with response code as
503
and error message asFailed to copy: SlowDown: Please reduce your request rate
It looks like with the upgraded VM and 1024 as threads, we are hitting the max copy rate of S3 which is 3,500 PUT/COPY/POST/DELETE requests/secondSo we end the test on S3 here for now. In future, we might pick it up again from this point to see if we can get the copy operation completed in under 60m.
Some more details about the VM if it's useful for someone:
Ulimit values set to 65K on the VM so that we don't have issues in max open files.
Data Sync
In the first section, I was able to copy the data in 1 hour 40m. Now the use case is, I need to keep both the cloud stores (Azure blob and AWS S3) in sync. I ran the same
rclone sync
command which I used in the first section, and it took more than 10 hours to complete. I was hoping the sync operation would be much faster than the initial copy (Even if we use1024
as the number of threads for sync operation, it would still take almost same time as the copy operation, I haven't tried this bigger number though)I tried using
--fast-list
and--max-age
options independently as well as combined and could not see any improvements in sync. I am suspecting the sync is taking time as it needs to scan and compare every object in source and destination and then decide if it needs to copy or not.So to achieve faster sync, we have come up with a new strategy
rclone move
command to move the newly created objects from the new storage account to AWS S3rclone move
is fast since the number of objects are low (replication happens only on the newly created objects)rclone move
, so on a daily basis, we will have only limited number of objects to be moved overIf anyone has any other solutions or recommendations for syncing the objects quickly with or without any additional steps (object replication etc), do let me know. If we can keep the source and destination consistent (with object deletion), that would be a cherry on the pie.
Beta Was this translation helpful? Give feedback.
All reactions