Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: Allow mdadm style "raid10" on 3 or other odd numbers of PVs #162

Open
sdgathman opened this issue Oct 22, 2024 · 2 comments
Open

RFE: Allow mdadm style "raid10" on 3 or other odd numbers of PVs #162

sdgathman opened this issue Oct 22, 2024 · 2 comments

Comments

@sdgathman
Copy link

I've been a long time user of mdadm "raid10" (which is not actually raid1+0) on 3 disks. For those new to "linux raid10", there have been a few articles written on it, e.g. https://serverfault.com/questions/139022/explain-mds-raid10-f2

In essence, it is raid1 with a clever segment allocation scheme. Since "raid10" is commonly understood as "raid1+0", maybe LVM could have a segtype of "raid1e" or similar rather than overloading "raid10" as mdadm does.

Alternatively, LVM could have a configurable allocation policy for the LV which accomplishes something similar. Currently, allocating "raid1" LVs use all available segments in the first 2 drives before touching the 3rd drive. Is this intentional (saving 3rd PV for a spare)?

Why not just get a 4th drive?

Low end servers come with 4 drive slots. raid1+0 on 4 drives means you get 1 drive failure, and the next one has a 33% chance of destroying all data on the array (one of the 3 remaining drives is now critical). Mdadm raid10 on 3 drives plus spare means you get 1 drive failure, and md immediately brings the spare online starts syncing. When the sync is done, you get one more drive failure with no issue. This means fewer site visits.

Using 3 drives with striping allocation scheme means higher performance than raid1 on 2 drives.

Why not just use mdadm (like I have been for decades)?

mdadm does not report which sectors (even as a 1st/lst range) are affected by mismatch_cnt. LVM doesn't either, BUT, the problem is narrowed down to one LV - which is a huge win over mdadm in that respect. Large non-ECC drive caches and flaky SSDs are more and more likely to fail to report corrupted data. (Plus similar issues with a non-ECC desktop.) 256M or more of non-ECC dram is a significant risk of bit flips from cosmic rays.

@sdgathman sdgathman changed the title Allow mdadm style "raid10" on 3 or other odd numbers of PVs RFE: Allow mdadm style "raid10" on 3 or other odd numbers of PVs Oct 22, 2024
@sdgathman
Copy link
Author

Work around for the time being. When creating an LV, run "pvs" and select the two PVs with the most free space for the new LV. This is similar to what btrfs does, and with the use case of running vms, should get parallel action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@sdgathman and others