-
Notifications
You must be signed in to change notification settings - Fork 254
Proposal for a new profile to repair files using 3 files. #977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The triple copy block group has been asked for in the past, though not with the additional majority rule repair strategy. I am not against it in principle, the usual answer for that amount of redundancy is to use more reliable hardware from the beginning, but there are situations where this can help. The example is an SD card in something like Raspberry pi where power spikes can damage the card although it's otherwise relatively stable hw environment. The RAID1C3 profile provides the redundancy level but requires 3 devices, so the triple copy can also work on one device, which does not need to be a single physical device but some compound device. The majority rule stragegy can be possibly applied to the RAID1C3 and C4 profiles too, though it would be good to hide that beind some configuration option as it would be able to ignore the failing checksums. |
I didn't even think of that.
I would imagine performing 3 checksums (in a worst case scenario) is faster than going straight to the majority rule repair strategy (MRRS). If we went straight to the MRRS, that would be a minimum of 3 system calls along with the logic of the MRRS. ...or did I misunderstand what you said? |
MRR is a poor fit for production use when firmware may drop writes. If a device loses a write after acknowledging it, all copies may be wrong in the same way. Voting doesn't help if they're identically wrong. This failure mode already affects DUP and RAID1. A third copy doesn't fix it -- it just adds I/O and wear without improving reliability. MRR assumes independent failures, but silent write drops are systemic. That said, MRR could be useful in a supervised recovery tool (e.g., |
I see the opposite. Metadata is already well protected against dropped writes and corruption—so well, in fact, that applying MRR here would be more likely to introduce new failure modes than prevent them. It would be a disaster for the same reason running Data, by contrast, is more exposed. There are valid cases for both MRR and bypassing csum verification:
This may reflect confusion between datacow and nodatacow behavior. In datacow mode, Btrfs stores checksums for file data, so error detection is possible even with a single copy. If a corruption is detected, a redundant copy (e.g. via raid1) allows correction. So raid1, raid1c3, and dup all provide error correction, not just detection. single mode offers detection only, as there’s nothing to correct against. In nodatacow mode, there are no data checksums, so detection depends entirely on redundancy: two mismatching copies indicate an error, but give no guidance about which is correct. Three or more copies (e.g. raid1c3) make MRR-style voting possible—but even then, MRR is only meaningful when the failures are independent, which isn’t the case if a device drops writes. So while MRR could help in some nodatacow scenarios, it’s no substitute for csums. And in datacow mode, checksums already provide stronger guarantees than majority voting can. |
Proposal
I have an idea of how to handle auto-repair on read and other features that use Btrfs's error detection of files.
As the time of writing this, Btrfs supports the profiles DUP, RAID1-like, and (experimentally) RAID5/6. These profiles can be used on the data, metadata, and system block groups. From what I understand, these all function similarly where:
The problem is that if both files are corrupt (even if they are corrupt on different blocks), there is no way to recover the file. I would like to propose a new profile that uses 3 files to provide error detection AND error correction.
Execution
The execution is similar to the DUP profile, but with extra steps:
If all 3 files are corrupt, Btrfs attempts to make a new file by using the majority rule on a block-by-block basis -- Btrfs compares the first block of each file and uses the one that appears at least two times for the new file. Btrfs then compares the second block of each file and so on until all of the blocks are compared. In the end, Btrfs makes a new file that then gets checksummed just like the original 3 files; if this new file passes the checksum, it is used and the original 3 files are replaced with this new one. If it fails, then it's unrecoverable.
If all three blocks are the same, use that block for the new file.
If two blocks are the same, but one is different, use a block from the matching two for the new file.
If all three blocks are different, Btrfs can stop here because the file is deemed unrecoverable.
Pros
Cons
Additional comments
I figured I'd bring this idea to the Btrfs project because I think that sacrificing 2/3 of storage for error correction is better than sacrificing 1/2 of storage for error detection. I don't see too many people using this for the data block group, but I can 100% see this be the default for the metadata and system block groups as they're already set to DUP by default for single drives.
The text was updated successfully, but these errors were encountered: