-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't create indexes from un-seekable fileobj's #95
Comments
I don't understand the use case. Why do you need an index for seeking when you can't seek on the file? |
@mxmlnkn Here's the use case: we can't seek during index creation but we can seek once the index is available. CodaLab allows users to upload files to Azure Blob Storage. During the upload process, we want to 1) upload the file to Blob Storage and 2) create the index. Once the file is on Azure Blob Storage, users can then download particular parts of the file using the ratarmount index. However, during the upload process, we are often just .tar.gz-streaming a directory -- so the fileobj is un-seekable. So it would be nice to be able to feed this stream directly into ratarmount so we can create the index as well. Right now, we currently first 1) upload the file to Blob Storage and then 2) download the entire file again, so that ratarmount can create the index -- this is slow and inefficient. Ideally, we could do both at the same time and not have to re-download the entire file just to create the index. |
Looks like we might need to resolve pauldmccarthy/indexed_gzip#102 first though... |
I was going to write something like that before you did ;). And it's not only In the end, I agree that it might be useful but it seems difficult to implement with all of ratarmount features. It might be implemented as a separate "function" on top of ratarmountcore because you wouldn't even need FUSE for that. I imagine something like Currently, I won't be able to work on this in a timely manner. I will review PRs though if you find the time to take a deeper look into it. Edit: One such mentioned problematic ratarmount option for streaming support could be |
That works!! Thanks. By the way pauldmccarthy/indexed_gzip#102 is now resolved. |
In theory, it should be possible to create indexes from un-seekable archive fileobj's (by using
peek
instead ofread
to check file headers, for example). I was able to get this to work by modifying an older version of ratarmount core here (https://github.com/codalab/codalab-worksheets/pull/4212/files#diff-ad5ad76eb55b6437b2e1aa24b324c6d11d176be1926ebea1950a006bfce4efbe). It would be nice if we could do the same for the current version of ratarmountcore, though the current version seems to rely a lot more onseek
.The text was updated successfully, but these errors were encountered: