You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of our work with generating All of Us datasets, we needed to copy around a million gcs objects. Our Copier infrastructure 'should' be able to handle that, but it kept falling with robustness issues. What finally worked was using GCS's rewrite api. This allowed us to copy data without reading it, allowing the copies to complete in a fraction of the time while also reducing bandwidth needs.
There are two components to this:
Research what specific APIs we can take advantage of
Update our code to use them when we can, for the Copier, and the new sync tool ([fs] basic sync tool #14248)
Here's the code I used for making the rewrite requests for merging a set of matrix tables together, the progress bar code was for visibility.
Were you using the old copier or the new (not yet merged) hailctl fs sync? I had hoped the latter was finally robust enough for real use. hailtop.aiotools.copy is indeed not very reliable. Regardless, using the rewrite action when the source and destination agree is the correct move.
We used a one off script, an attempt was made to use Copier.copy, but that wasn't reliable enough. We also needed to rename destination files beyond what the sync (or copy) tool is capable of.
As part of our work with generating All of Us datasets, we needed to copy around a million gcs objects. Our
Copier
infrastructure 'should' be able to handle that, but it kept falling with robustness issues. What finally worked was using GCS's rewrite api. This allowed us to copy data without reading it, allowing the copies to complete in a fraction of the time while also reducing bandwidth needs.There are two components to this:
Copier
, and the new sync tool ([fs] basic sync tool #14248)Here's the code I used for making the rewrite requests for merging a set of matrix tables together, the progress bar code was for visibility.
The text was updated successfully, but these errors were encountered: