Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace MD5 with xxhash #16

Open
py0xc3 opened this issue Sep 13, 2022 · 6 comments
Open

Replace MD5 with xxhash #16

py0xc3 opened this issue Sep 13, 2022 · 6 comments

Comments

@py0xc3
Copy link

py0xc3 commented Sep 13, 2022

DNF still uses MD5 for error correction after downloading. I suggest to replace MD5 with xxhash or another comparable algorithm. This will improve the performance. Given that there is a OpenPGP-based cryptographic authentication/verification later anyway, it will be sufficient to stick with xxhash 64 for error correction. However, even with gpgcheck=0 I expect xxhash 64 sufficient for error correction.

On 64 bit architectures, xxhash 64 already increases the performance in a noteworthy manner in BTRFS's checksumming compared to CRC32C (which is itself generally faster than MD5), where xxhash 64 was standardized along with three other algorithms. xxhash is specifically designed for modern architectures, unlike CRC32 or MD5. As MD5 used to have a cryptographic purpose (which it no longer fulfills anyway), it is unlikely that it will have a performance advantage against xxhash on any architecture that is in use today.

If it is easier in development to stick with 128 bit length, there is also xxhash 128.

Further, the md5 use becomes obvious to users through the "md5 mismatch of result" when packages have a mismatch of result after downloading, which makes dnf to download the whole package instead of only the delta. However, users instinctively link md5 to "broken crypto" which decreases trust in dnf, even if md5 is used in a non-crypto function like here. It was opened a topic about that some days ago on discussion.fedoraproject.org (just one example) by a user who was also a bit misled by the "md5" in his dnf: md5 and its reputation can create confusion and misinterpretation when users see that it is in use.

I found it in this repo with git grep --text "md5 mismatch"

This issue is not critical, but might be considered in future developments.

@crrodriguez
Copy link

Unfortunately what is been checked is the md5 signature embeeded on the RPM files. changing the algorithm will break with existent packages.. the whole design of this thing is bonkers. details of the particular checksum algorithm used by the implementation should have never been transparent to the callers..But one needs to deal with it 8-(
Today the algorithm selected is wrong, slow and broken.

@mlschroe
Copy link
Contributor

mlschroe commented Feb 8, 2023

It's not wrong at all. The md5 sums are used as better crc function, not in a cryptographic sensitive way.

@mlschroe
Copy link
Contributor

mlschroe commented Feb 8, 2023

(But yes, something like xxhash would have been a better choice. But it didn't exist at that time.)

@mlschroe
Copy link
Contributor

mlschroe commented Feb 8, 2023

(And please do not forget that deltarpm was written in 2005...)

@py0xc3
Copy link
Author

py0xc3 commented Feb 8, 2023 via email

@mlschroe
Copy link
Contributor

mlschroe commented Feb 8, 2023

Yeah, changing the message so that it no longer mentions md5 is certainly a good idea.

Regarding rpm upstream: see rpm-software-management/rpm#1292

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants