Case study: recovery of a corrupted 12 TB multi-device pool (github.com)

by salt4034 65 comments 116 points
Read article View on HN

65 comments

[−] yjftsjthsd-h 39d ago

> This is not a bug report. [...] The goal is constructive, not a complaint.

Er, I appreciate trying to be constructive, but in what possible situation is it not a bug that a power cycle can lose the pool? And if it's not technically a "bug" because BTRFS officially specifies that it can fail like that, why is that not in big bold text at the start of any docs on it? 'Cuz that's kind of a big deal for users to know.

EDIT: From the longer write-up:

> Initial damage. A hard power cycle interrupted a commit at generation 18958 to 18959. Both DUP copies of several metadata blocks were written with inconsistent parent and child generations.

Did the author disable safety mechanisms for that to happen? I'm coming from being more familiar with ZFS, but I would have expected BTRFS to also use a CoW model where it wasn't possible to have multiple inconsistent metadata blocks in a way that didn't just revert you to the last fully-good commit. If it does that by default but there's a way to disable that protection in the name of improving performance, that would significantly change my view of this whole thing.

[−] rincebrain 39d ago
As far as I can see, no, the author disabled nothing of the sort that he documented.

I suspect that the author's intent is less "I do not view this as a bug" and more "I do not think it's useful to get into angry debates over whether something is a bug". I do not know whether this is a common thing on btrfs discussions, but I have certainly seen debates to that effect elsewhere.

(My personal favorite remains "it's not a data loss bug if someone could technically theoretically write something to recover the data". Perhaps, technically, that's true, but if nobody is writing such a tool, nobody is going to care about the semantics there.)

[−] yjftsjthsd-h 39d ago

> I suspect that the author's intent is less "I do not view this as a bug" and more "I do not think it's useful to get into angry debates over whether something is a bug".

Agreed, and I appreciate the attempt to channel things into a productive conversation.

[−] rcxdude 39d ago
btrfs's reputation is not great in this regard.
[−] stingraycharles 39d ago
As far as I understand, single device and RAID1 is solid, but as soon as you want to do RAID1+0 or RAID5/6 you’re entering dangerous territory with BTRFS.
[−] Retr0id 39d ago
Unless I missed it the writeup never identifies a causal bug, only things that made recovery harder.
[−] harshreality 39d ago
Using DUP as the metadata profile sounds insane.

Changing the metadata profile to at least raid1 (raid1, raid1c3, raid1c4) is a good idea, especially for anyone, against recommendations, using raid5 or raid6 for a btrfs array (raid1c3 is more appropriate for raid6). That would make it very difficult for metadata to get corrupted, which is the lion's share of the higher-impact problems with raid5/6 btrfs.

check:

    btrfs fi df 
convert metadata:

    btrfs balance start -mconvert=raid1c3,soft 
(make sure it's -mconvert — m is for metadata — not -dconvert which would switch profiles for data, messing up your array)
[−] throwaway270925 39d ago

> A hard power cycle on a 3 device pool (data single, metadata DUP, DM-SMR disks) left the extent tree and free space tree in a state that no native repair path could resolve.

As a ZFS wrangler by day:

People in this thread seem to happily shit on btrfs here but this seems to be very much not like a sane, resilient configuration no matter the FS. Just something to keep in mind.

[−] Retr0id 39d ago
This is obviously LLM output, but perhaps LLM output that corresponds to a real scenario. It's plausible that Claude was able to autonomously recover a corrupted fs, but I would not trust its "insights" by default. I'd love to see a btrfs dev's take on this!
[−] stinkbeetle 39d ago

> Case study: recovery of a severely corrupted 12 TB multi-device pool, plus constructive gap analysis and reference tool set #1107

Please don't be btrfs please don't be btrfs please don't be btrfs...

[−] jamesnorden 39d ago
People swear btrfs is "safe" now, but I've personally been bitten by data corruption more than once, so I stay away from it now.
[−] c-c-c-c-c 39d ago
Added to my list of reasons to never use btrfs in production.
[−] londons_explore 39d ago
Btrfs allows migration from ext4 with a rather good rollback strategy...

Post-migration, a complete disk image of the original ext4 disk will exist within the new filesystem, using no additional disk space due to the magic of copy-on-write.

Why isn't the repair process the same? Fix the filesystem to get everything online asap, and leave a complete disk image of the old damaged filesystem so other recovery processes can be tried if necessary.

[−] guilhas 33d ago
This reads something between cluless/malicious and genius. Crosses several red lights with a car, smashes the car, rebuilds the car with AI, tells people to cross red lights

Keeps repeating btrfs check --repair . This command is dangerous and warned anywhere as a last resort: if you try to execute it you get a warning; the documentation has a warning; any guide from google tell you not to run it unless all else fails; chatgpt/lechat do not metion it, or note it as last resort. So not sure why he keeps repeating it without any note

> Use these tools ONLY if btrfs check --repair segfaults, enters an infinite loop, or leaves the filesystem in worse shape than before.

> Timeline of events ... First repair attempts. btrfs check --repair

The guy is recommending people brick their volumes permanently as first resort without any warning

Between using a dup profile and this I would not be surprised a btrfs dev just disregarding all as slop

> Pool only mounts with rescue=all,ro, fails to mount RW

Also this is important, the data was not lost. Even though read-only

I don't think I would run this code. Still it would be interesting a btrs dev to have look and comment if there is any value in the code generated. As it would be definitely interesting being able to repair more issues in the pool safely inplace

[−] duskdozer 39d ago
Welp. Guess I need to figure out another fs to use for a few drives in a nonraid pool I haven't gotten around to setting up yet. I forget why zfs seemed out. xfs?
[−] phoronixrly 39d ago
To theal author: did you continue using btrfs after this ordeal? An FS that will not eat (all) your data upon a hard powercycle only at the cost of 14 custom C tools is a hard pass from me no matter how many distros try to push it down my throat as 'production-ready'...

Also, impressive work!

[−] lnx01 39d ago
bulletproof/bulletproof/bulletproof .... Gemini LLM
[−] devnotes77 39d ago
[dead]
[−] weiyong1024 39d ago
[dead]
[−] blae 39d ago
[flagged]