I've been using btrfs for maybe 10 years now? -- on a single Linux home NAS. I use it in a raid1c3 config (I used to do c2). raid1cN is mirroring with N copies. I have compression on. I use snapshots rarely.
I've had a few issues, but no data loss:
* Early versions of btrfs had an issue where you'd run out of metadata space (if I recall). You had to rebalance and sometimes add some temporary space do that.
* One of my filesystems wasn't optimally aligned because btrfs didn't do that automatically (or something like that -- this was a long time ago.) A very very minor issue.
* Corruption (but no data loss, so I'm not sure it's corruption per se...) during a device replacement.
This last one caused no data loss, but a lot of error messages. I started a logical device removal, removed the device physically, rebooted, and then accidentally readded the physical device while it was still removing it logically. It was not happy. I physically removed the device again, finished the logical remove, and did a scrub and the fsck equivalent. No errors.
I think that's a testament to its resiliency, but also a testament how you can shoot yourself in the foot.
I've never used RAID5/6 on btrfs and don't plan to -- partly because of the scary words around it, but I also assume the rebuild time is longer.
Funny to hear your success; I've managed to break almost every mirror I've entrusted to BTRFS! How? Holding down the power button!
Seemingly regardless of the drives, interface, or kernel, other filesystems paired with LVM or mdraid fail/recover/lie more gracefully. NVMe or SATA (spindles). Demonstrated back-to-back with replacements from different batches.
Truly disheartening, I want BTRFS. I would like to dedicate some time to this, but, well, time remains of the essence. I'm hoping it's something boring like my luck with boards/storage controllers, /shrug.
Well, what are you waiting for? Get your findings to the btrfs-devel mailing list, include your drive make and model. Even better if it's reproducable.
TLDR: busy, lazy, not properly incentivized. I'll get right on that, boss... or is it Officer? I already said: time. I'd like to spend more of my time with triage before I disrupt others, particularly the developers. I don't mind y'all so much :)
It's reproducible, the scope needs to be reduced. With work. A lot of testing and variable change/reduction. More than I care for.
The problem: R&R, work/money, etc, all compete for a limited amount of time. I'll spend it how I like, Square? Comments win over rigorous testing with my schedule, thanks.
Why don't you try to reproduce it? Better things to do, this isn't the mailing list? Exactly. Pick a reason, there's plenty.
I think of this as fail fast. Fail immediately so its easy to root cause the failure and not have it be hidden and cause more obscure side effects later.
There was an awesome viral video of someone offloading their frustration and a full mag on an HP printer. Now I can't find the original because it started a trend of copiers.
I had this argument presented to me and I wasn't sure what to do with it.
> Humans are allowed to "absorb" art around them into their brains and generate derivative art. People may copy Miyazaki's style... why shouldn't an AI farm be allowed to?
Let's put aside for a moment that AI may have "consumed" some art without a license (e.g., "google books" - did google purchase every book?).
except lawyers keep saying "fanart is actually technically illegal" and resinging/changing lyrics in songs isn't enough to be protected by "fair use" stuff
if anything, I'd campaign for "we should limit copyright because it already doesn't work for Ai"
The same legal rule applies to both for determining whether something is a derivative work.
No one is stopping you from using similar proportions or colors as Miyazaki to draw a character. You are also allowed to draw your own interpretation of an electric mouse-like monster.
Copyright infringement occurs if that character looks exactly like say Totoro or Pikachu. That is not “in the style of”, that is copying.
A problem with LLMs is that since their corpus is so large, it is difficult to identify when any given output is crossing that line because a single observer’s knowledge of the works influencing the output is limited. You might feed it a picture of your grandfather and it returns an almost exact copy of a grandfather character from a Miyazaki film you haven’t seen. If you don’t share the output with others, it might never be noticed that the infringement occurred.
The given argument conflates the slightest influence with direct copying. It is a reductive take that, personally, I’ve found emblematic of pro-LLM arguments.
Thanks for helping pick apart the argument presented to me.
I don't like the idea that photos I've published on, say, flickr have been pulled into these. Especially stuff I've published with creative commons non-commercial use.
> People may copy Miyazaki's style... why shouldn't an AI farm be allowed to?
People may take a penny from the tray at the 7-11, so why can't an AI farm take pennies from all the trays? Or take them from a much bigger tray and do it a couple of million times?
I adopted Python type annotations on a new project I was writing. Requirements shifted a lot as well as the implementation.
It was amazing. I could refactor quickly after changing a dataclass, field name, function arguments, type, etc. I just ran mypy and it immediately told me everywhere I needed to update code to reference the new refactored type or data structure.
Then only after it was mypy clean, I ran the unit tests.
My style has changed over time and part of it is thanks to static type checking in Python. I rarely use dictionaries anymore when what I actually want is a different type that functions will handle down the line. So to transfer data, I usually make frozen dataclasses where I used to use dictionaries. It's more work when you want to add fields on the fly ofc but it pays dividend anytime the logic becomes more complex.
Agreed -- dataclasses over dicts. And for legacy code I try to move them to typed dictionaries.
Pydantic is also helpful to enforce types on json.
I've also stopped passing around argparse namespaces. I immediately push the argparse namespace into a pydantic class (although a dataclass could also be used.)
Yes, but, the longer I use python (for personal and admin tasks mostly), the more the REPL and pytest let me sneak up on the 80% solution to my task at hand and get on with life.
The scope of possibility does not end with a full-on enterprise application, having all of the Bell() and Whistle() classes.
Even better, run `mypy` as part of your LSP setup and you don't even need to wait to run `mypy` to see type errors! If I make a mistake I want to be notified immediately.
I've had a few issues, but no data loss:
* Early versions of btrfs had an issue where you'd run out of metadata space (if I recall). You had to rebalance and sometimes add some temporary space do that.
* One of my filesystems wasn't optimally aligned because btrfs didn't do that automatically (or something like that -- this was a long time ago.) A very very minor issue.
* Corruption (but no data loss, so I'm not sure it's corruption per se...) during a device replacement.
This last one caused no data loss, but a lot of error messages. I started a logical device removal, removed the device physically, rebooted, and then accidentally readded the physical device while it was still removing it logically. It was not happy. I physically removed the device again, finished the logical remove, and did a scrub and the fsck equivalent. No errors.
I think that's a testament to its resiliency, but also a testament how you can shoot yourself in the foot.
I've never used RAID5/6 on btrfs and don't plan to -- partly because of the scary words around it, but I also assume the rebuild time is longer.
reply