Replies: 5 comments 5 replies
-
You would need multiple but flips to do this offhand. One bit flip in a buffer being checksummed is not enough to get ZFS to do a bad repair write. That said, bitflips can happen anywhere. One place that is interesting would be a buffer for a write before the checksum is done. That would cause bad data to be written with a good checksum and if it is metadata, then in an extreme case, a pool could become unimportable. This happened to one of the developers on his personal machine. He then debugged it, found the bitflip and wrote a custom tool to fix it. Another interesting place would be the machine code itself. What bit flips do there is dependent on how the ISA works, but conceivably, you could have one instruction morph into another, or a operand be changed to indicate the wrong register, provided that the bit flip does not cause the instruction to become an illegal instruction. |
Beta Was this translation helpful? Give feedback.
-
Thanks! So for important files, I conclude, comparing the stored file (after emptying the read cache) with the original file rules out file corruption. Concerning writing metadata, I wonder whether is makes sense to implement an option to double-check metadata writes in order to rule out that problem. There is a chance that ZFS will become more and more popular and used on systems with non-ECC memory. A general question is whether to what extend OpenZFS is ready for use in production. There are more open issues for ZFS than for ext4, but I have no experience in order to interpret these figures adequately. I raised this question
|
Beta Was this translation helpful? Give feedback.
-
That will not work. First, we do not have a check to know whether metadata being written out is corrupt or not. The assumption is that it is good. That is how you would get bad metadata with a good checksum. Second, when the only in memory copy of metadata being written out is corrupt and checksum generated from it says the metadata is not corrupt, verifying the checksum is not going to catch a problem. That being said, this would be a problem for any filesystem on a machine that does not have ECC, even ones that do not do checksums.
I would not consider that to be comparable for several reasons:
Looking at the kernel.org bugzilla and e2fsprogs github, I do not see Ted T'so and other ext4 developers filing bugs against ext4 to track things that they found. I am not sure if they are actively bug hunting like a number of us are. Also, the number of reports is not a full picture. In terms of QA, every pull request to OpenZFS is subject to a test suite, plus stochastic testing in userspace. That catches a number of things. I believe ext4 uses the XFS test suite, although it does not have stochastic testing in userspace. ext4 also does not have the same number of runs. If I recall, Ted T'so does a test run once a day, while the ZFS test suite is run dozens of times every day. Lastly, both Linux and OpenZFS are using coverity scans to find potential bugs, and the defect densities are the opposite of the outstanding bug reports: https://scan.coverity.com/projects/openzfs-zfs Our kernel module is at 0.13 unresolved defects per 1000 lines while the ext* kernel modules are at 0.57 unresolved defects per 1000 lines. |
Beta Was this translation helpful? Give feedback.
-
For what it is worth, CPU L3 caches are typically protected by ECC. Hypothetically speaking, if you made a hypervisor that ran a CPU in no-fill mode, you could use the L3 cache as RAM and implement ECC in software. However, it would be slow and I have only ever heard of this being done in a proprietary hypervisor to implement software based memory encryption. |
Beta Was this translation helpful? Give feedback.
-
Of course, in order to maintain independence, the code for metadata computation and all its subroutines (e.g., checksum code) would have to be present twice in memory.
Ok, so verified (importable) backups seem to be inevitable, even using ZFS with a redundant disk array with scrubbing on a regular basis.
I clearly disagree here, for two reasons. First, the chance of a metadata bit flip is relatively high. In the case of the unimportable pool known to you, was the main node (main directory metadata block) affected or just any metadata block (one of many, an arbitrary metadata block)? Secondly, the argument
concerning safety, and here expressed in terms of mathematical improbability, is clearly invalid. For example, as written in NSA CYBERSECURITY 2020 YEAR IN REVIEW, page 5, the NSA obviously uses its publicly known or similar algorithms for controlling nuclear missiles:
You couldn't argue that accidentally launching a nuclear intercontinental missile or allowing to launch by a third party breaking the cryptography due to a weak algorithm is an "outlier". The algorithm then wasn't strong enough (the probability too high). In practice, sufficient improbability means infeasibility. And the 20,000 or 30,000 mathematicians at the NSA seem to share my view ... Although in a totally different context, you could argue similarly with regards to nuking a filesystem. Clearly the design is too poor, allowing such an incidence (loss of all data) even once. There hasn't ever been a successful brute-force attack on an 128-bit key, and to my knowledge not even one on an 80-bit key, as SHA-1 with 80-bits of security was broken because of a design flaw, not by brute-force. So I would still advocate the duplicate independent computation of metadata blocks because of their importance, increasing security from maybe from 50 to 100 bits of security, hence increasing the security margin by roughly 50 bits (making an unimportable pool because of metadata corruption 1125899906842624 times less likely), as a consequence of the widespread use of ZFS on non-ECC memory nowadays, following the general policy of ZFS employing checksums and redundancy to protect against loss or corruption of data because of hardware failure.
Would it be sufficient to run a ZFS mirror with three hard disks, and from time to time remove one of the disks containing the data (replacing it with a new empty disk to be filled with the data still available on the other two), and make a test import in read-only mode of the pool on the removed disk at another computer to ensure that this disk is a valid backup and can be stored at a different place? |
Beta Was this translation helpful? Give feedback.
-
Could a random bit flip in memory lead to an incorrect self-healing, and hence to a file corruption in ZFS?
I wonder whether a random bit flip in memory might cause a checksum mismatch during reading or scrubbing, causing ZFS to prefer an alternative data block and corrupt an actually correct file on the hard disk by modifying it (self-healing).
Many computers nowadays don't have ECC, and there should be mechanism against that, such as maybe re–calculating the checksum a second time in the rare case of a checksum mismatch.
Main thread: https://forums.raspberrypi.com/viewtopic.php?p=2089368#p2089368
Beta Was this translation helpful? Give feedback.
All reactions