Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large files corrupted after sync by desktop clients #114

Open
rpezzi opened this issue Aug 27, 2020 · 9 comments
Open

Large files corrupted after sync by desktop clients #114

rpezzi opened this issue Aug 27, 2020 · 9 comments

Comments

@rpezzi
Copy link

rpezzi commented Aug 27, 2020

Hi.

On my setup, large files uploaded by desktop clients gets corrupted on the Davros app (observed this for files of 500MB and 1GB).

Soon after a large file is uploaded to Davros by a desktop client, a corrupted version of the file is downloaded back to the PC by the client, overwriting the original file. I reproduced this behavior with Davros on both, a private sandstorm instance and on a demo instance at alpha.sandstorm.io. Same problem for NextCloud desktop client 2.5.1 (official Debian repository) & the latest Owncloud client (2.6.3). Clients running on Debian 9. This was present on Davros 0.26.1 and persists on 0.27.3.

No problem if the file is uploaded though the Sandstorm web interface.

By the way, thanks for giving more life to Davros.

@ocdtrekkie
Copy link
Contributor

@zenhack Is this a sandstorm-io/sandstorm#2464 related issue or is it likely Davros-specific?

@zenhack
Copy link

zenhack commented Aug 27, 2020

I poked around a bit, and I think that in theory nextcloud desktop should handle not supporting range requests OK:

https://github.com/nextcloud/desktop/blob/56a6fe4731a8d71c314c44822a960a66acfb9f67/src/libsync/propagatedownload.cpp#L205

That said it's entirely possible that that logic isn't super battle tested, or that I've misunderstood something.

@mnutt
Copy link
Owner

mnutt commented Aug 17, 2021

FWIW I don't know that owncloud/nextcloud clients typically use Range requests; from what I remember they upload hundreds of smaller files and then send a command to finalize and concatenate them together. I tested davros itself (sans-sandstorm) by manually hashing a particular 2GB file before and after upload and wasn't able to reproduce, unfortunately. I'll take another look at that concatenation code and see if I can reproduce it in sandstorm.

@mnutt
Copy link
Owner

mnutt commented Aug 17, 2021

I was able to reproduce on sandstorm, but I don't think it's a sandstorm problem. I think I see the likely cause of this in reading the above nextcloud docs, which unfortunately was undocumented at the time I wrote the uploading chunking.

Originally, ownCloud client would pick a chunk size prior to starting the upload, and upload chunks in order. While ownCloud's own implementation saved the chunks to a temporary directory and then concatenated them together, mine aimed to be a bit more efficient: I had just one tempfile, and fseek()'d to the calculated offset to write each chunk. This meant we didn't need double the file's size temporarily, was very cheap to finalize, and didn't involve a bunch of little files.

My guess is that the client has started adaptively changing the chunk size, which will cause the calculation (chunk size + chunk number) to be wrong. The fact that the corrupted file has the right filesize supports this.

mnutt added a commit to mnutt/sandstorm that referenced this issue Aug 17, 2021
Adds some additional ownCloud/NextCloud client headers relating to mnutt/davros#114.

The new headers:
* `oc-checksum` - a hash (typically `SHA1`) of the entire file. After all chunks have been uploaded, davros should be checking this to ensure that it uploaded correctly, otherwise we may end up saving a corrupted file. [this header has been around for a while, I should have been doing this all along]
* `oc-chunk-offset` - part of a new chunking scheme. Right now the legacy chunking scheme relies on all chunks being the same size in order to calculate the offset (number * chunkSize); this header would allow the client to vary the chunk size based on timing/bandwidth.
* `oc-lazyops` - I'm unlikely to implement this soon, but it would be a way for the client to signal "I don't want to wait for a response". The server would then hypothetically respond with a location where the client could poll for progress updates.
@mnutt
Copy link
Owner

mnutt commented Aug 19, 2021

Once sandstorm allows a few extra request headers through I’ll be able to checksum the final result and ensure that at least no data is ever corrupted. But it wouldn’t actually solve the underlying issue which I think may be pieces coming in out of order. The easiest way to solve that is to opt into the newer NextCloud/OwnCloud chunking mechanism that passes an oc-file-offset header which means we can always write the resulting file in the right order. That will be coming shortly after the sandstorm header changes.

@ocdtrekkie
Copy link
Contributor

@rpezzi I believe this should be resolved by the experimental Davros version 0.31.1 in the experimental market: https://apps.sandstorm.io/app/8aspz4sfjnp8u89000mh2v1xrdyx97ytn8hq71mdzv4p4d8n0n3h?experimental=true

If you get a chance to test this version in a way that doesn't put your production data at risk, that would be really helpful!

@orblivion
Copy link

I noticed an issue with large file corruption that I wanted to report. I'm putting it here in case it's related, but I'd be happy to make a new issue.

I uploaded a ~1GB zip file successfully. I went as far as downloading it and comparing the md5sum with what I'd uploaded.

But then I give it to my friend. The md5sum didn't match for him on his first try. He tried downloading again and it matched. I'll give his full report:

I downloaded the file, but the checksum did not match. I got <redacted> for md5 hash.

File size was 1649122062, but the Web UI reports it as 1GB. Perhaps a rounding error :).

I tried once again, and not sure if clicked something different, but I made sure to click on “Download Original”.
One thing I observed, the download gets interrupted and needed to resume at least once.

That 2nd download complete and the checksum matched. Thanks!

@ocdtrekkie
Copy link
Contributor

Was the download that did not complete come from the web interface or a sync client? I'm wondering if the grain managed to terminate during the download somehow.

@orblivion
Copy link

Oh right, to be clear this was all web.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants