Large files corrupted after sync by desktop clients #114

rpezzi · 2020-08-27T20:37:08Z

Hi.

On my setup, large files uploaded by desktop clients gets corrupted on the Davros app (observed this for files of 500MB and 1GB).

Soon after a large file is uploaded to Davros by a desktop client, a corrupted version of the file is downloaded back to the PC by the client, overwriting the original file. I reproduced this behavior with Davros on both, a private sandstorm instance and on a demo instance at alpha.sandstorm.io. Same problem for NextCloud desktop client 2.5.1 (official Debian repository) & the latest Owncloud client (2.6.3). Clients running on Debian 9. This was present on Davros 0.26.1 and persists on 0.27.3.

No problem if the file is uploaded though the Sandstorm web interface.

By the way, thanks for giving more life to Davros.

ocdtrekkie · 2020-08-27T20:44:54Z

@zenhack Is this a sandstorm-io/sandstorm#2464 related issue or is it likely Davros-specific?

zenhack · 2020-08-27T21:16:01Z

I poked around a bit, and I think that in theory nextcloud desktop should handle not supporting range requests OK:

https://github.com/nextcloud/desktop/blob/56a6fe4731a8d71c314c44822a960a66acfb9f67/src/libsync/propagatedownload.cpp#L205

That said it's entirely possible that that logic isn't super battle tested, or that I've misunderstood something.

mnutt · 2021-08-17T02:40:41Z

FWIW I don't know that owncloud/nextcloud clients typically use Range requests; from what I remember they upload hundreds of smaller files and then send a command to finalize and concatenate them together. I tested davros itself (sans-sandstorm) by manually hashing a particular 2GB file before and after upload and wasn't able to reproduce, unfortunately. I'll take another look at that concatenation code and see if I can reproduce it in sandstorm.

mnutt · 2021-08-17T03:41:31Z

I was able to reproduce on sandstorm, but I don't think it's a sandstorm problem. I think I see the likely cause of this in reading the above nextcloud docs, which unfortunately was undocumented at the time I wrote the uploading chunking.

Originally, ownCloud client would pick a chunk size prior to starting the upload, and upload chunks in order. While ownCloud's own implementation saved the chunks to a temporary directory and then concatenated them together, mine aimed to be a bit more efficient: I had just one tempfile, and fseek()'d to the calculated offset to write each chunk. This meant we didn't need double the file's size temporarily, was very cheap to finalize, and didn't involve a bunch of little files.

My guess is that the client has started adaptively changing the chunk size, which will cause the calculation (chunk size + chunk number) to be wrong. The fact that the corrupted file has the right filesize supports this.

Adds some additional ownCloud/NextCloud client headers relating to mnutt/davros#114. The new headers: * `oc-checksum` - a hash (typically `SHA1`) of the entire file. After all chunks have been uploaded, davros should be checking this to ensure that it uploaded correctly, otherwise we may end up saving a corrupted file. [this header has been around for a while, I should have been doing this all along] * `oc-chunk-offset` - part of a new chunking scheme. Right now the legacy chunking scheme relies on all chunks being the same size in order to calculate the offset (number * chunkSize); this header would allow the client to vary the chunk size based on timing/bandwidth. * `oc-lazyops` - I'm unlikely to implement this soon, but it would be a way for the client to signal "I don't want to wait for a response". The server would then hypothetically respond with a location where the client could poll for progress updates.

mnutt · 2021-08-19T17:05:22Z

Once sandstorm allows a few extra request headers through I’ll be able to checksum the final result and ensure that at least no data is ever corrupted. But it wouldn’t actually solve the underlying issue which I think may be pieces coming in out of order. The easiest way to solve that is to opt into the newer NextCloud/OwnCloud chunking mechanism that passes an oc-file-offset header which means we can always write the resulting file in the right order. That will be coming shortly after the sandstorm header changes.

ocdtrekkie · 2021-09-22T19:51:11Z

@rpezzi I believe this should be resolved by the experimental Davros version 0.31.1 in the experimental market: https://apps.sandstorm.io/app/8aspz4sfjnp8u89000mh2v1xrdyx97ytn8hq71mdzv4p4d8n0n3h?experimental=true

If you get a chance to test this version in a way that doesn't put your production data at risk, that would be really helpful!

orblivion · 2024-12-13T19:01:47Z

I noticed an issue with large file corruption that I wanted to report. I'm putting it here in case it's related, but I'd be happy to make a new issue.

I uploaded a ~1GB zip file successfully. I went as far as downloading it and comparing the md5sum with what I'd uploaded.

But then I give it to my friend. The md5sum didn't match for him on his first try. He tried downloading again and it matched. I'll give his full report:

I downloaded the file, but the checksum did not match. I got <redacted> for md5 hash.

File size was 1649122062, but the Web UI reports it as 1GB. Perhaps a rounding error :).

I tried once again, and not sure if clicked something different, but I made sure to click on “Download Original”.
One thing I observed, the download gets interrupted and needed to resume at least once.

That 2nd download complete and the checksum matched. Thanks!

ocdtrekkie · 2024-12-13T19:14:27Z

Was the download that did not complete come from the web interface or a sync client? I'm wondering if the grain managed to terminate during the download somehow.

orblivion · 2024-12-13T19:19:41Z

Oh right, to be clear this was all web.

mnutt mentioned this issue Aug 17, 2021

Add more owncloud/nextcloud client headers sandstorm-io/sandstorm#3552

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large files corrupted after sync by desktop clients #114

Large files corrupted after sync by desktop clients #114

rpezzi commented Aug 27, 2020

ocdtrekkie commented Aug 27, 2020

zenhack commented Aug 27, 2020

mnutt commented Aug 17, 2021

mnutt commented Aug 17, 2021

mnutt commented Aug 19, 2021

ocdtrekkie commented Sep 22, 2021

orblivion commented Dec 13, 2024

ocdtrekkie commented Dec 13, 2024

orblivion commented Dec 13, 2024

Large files corrupted after sync by desktop clients #114

Large files corrupted after sync by desktop clients #114

Comments

rpezzi commented Aug 27, 2020

ocdtrekkie commented Aug 27, 2020

zenhack commented Aug 27, 2020

mnutt commented Aug 17, 2021

mnutt commented Aug 17, 2021

mnutt commented Aug 19, 2021

ocdtrekkie commented Sep 22, 2021

orblivion commented Dec 13, 2024

ocdtrekkie commented Dec 13, 2024

orblivion commented Dec 13, 2024