Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Consumer mircoSD cards commonly have firmware bugs. A Raspberry Pi running Linux will consistently trigger these bugs given enough time, even if you mount the card read-only. The nand is fine, but the card will be unreadable until reformatted. You might never see this from a single Pi running at home, but people who deploy large fleets of Pis inevitably run into these problems. Industrial microSD cards aren't perfect, but they are more reliable.


Removing an SD card from its socket during a write operation can actually brick the card enough to no longer work at the block level. Perhaps it's brand dependent, but I'm talking about a well known name brand. It's rare, but I know of a ziplock bag of bricked SD cards that shows it's not too rare.

I suspect it occurs when a card loses power while in the middle of shifting data around for wear leveling purposes. Presumably, some internal data structure gets borked in a way that the card's embedded MCU firmware didn't anticipate and can't recover from.


It doesn't even have to be during a write operation. Nand needs to be periodically refreshed after a read, so the controller can be shuffling data even on a read-only mounted card. I've read speculation that most consumer cards store their controller firmware on the same nand that is used as the general storage, and thus are prone to corrupting their own firmware if something goes wrong while shuffling data.

Power loss is definitely a threat to an SD card in a Pi. But I've seen many cards go bad while the system is still running, so it's not the whole story.

Weirdly, I've had a number of cards that entered a state where they would be inaccessible under Linux or Mac OS. But they could easily be recovered by a re-format under Windows. No idea why. Perhaps the manufacturers test a lot more thoroughly with Windows?


That's a good hypothesis re: the MCU firmware on the NAND. If it's true that all NAND needs to be periodically refreshed, then it could be as simple as losing power while one of the firmware sectors is getting refreshed. That seems like an obscure enough scenario that there's dragons lurking around it.

I like to imagine in my head that the MCU in an SD card can detect its voltage dropping quickly enough to get into a safe state most of the time. With all the recommended capacitance around the socket, it might actually have a few milliseconds to work with after external power loss. When you pull the card out of a socket, though, the MCU stops working before it has time to do anything about it.


That's probably true. I don't run one board though. I run ~12 24/7 for years. I also run many other devices from uSD cards that get turned on/off hundreds of times a day on occasion, when I'm doing some development, including forced power cuts.

I didn't have many issues with my consumer cards of choice, which are Sandisk Ultra A1, which I chose because they are cheap, not too slow and not too fast (interface limit is 23MiB/s and there's no point using faster cards), made by the actual manufacturer of the NAND chips not just some packager buying random NAND chips from a bucket on the spot market, and have verifiable origin via scratch codes.

Not sure about their failure modes, because all of the cards I had fail on me so far were Samsung ones. (some/most of them probably fakes, because I bought them on aliexpress, and Samsung didn't give a crap about consumers being able to verify origin of the card, until very recently and for select cards only)

So far I'm very happy. Cheap and reasonably reliable low power storage.

I guess if I had 10000's of cards in the wild in harsher conditions, I'd care if 1/100 failed on me per year. But in home conditions, 1/100 per year means 1 card failing every 2-3 years in any of my devices, which is very tolerable, and easy to recover from.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: