LBA NAND corruption
wmb at laptop.org
Tue Oct 21 14:41:48 EDT 2008
David Woodhouse wrote:
> On Tue, 2008-10-21 at 12:22 -0400, John Watlington wrote:
>> One of the LBA NAND test machines killed it's MBR.
>> It started with a failed comparison of the commonly
>> written blocks, then stopped talking to the device at
>> On reboot, fdisk showed no partition table.
>> dd of /dev/lba showed all FFs for the first 16K,
>> then 00 for the next 2K, then data.
>> Suggestions on how to proceed w. debugging
>> are welcome.
> This is one of the reasons I'm so concerned about this type of device.
This is indeed a serious concern. But it has to be balanced against the
hardware problem that CaFe doesn't work with the next generation of raw
NAND chips. Maybe that hardware problem can be solved, maybe not. At
the moment, there are no obviously-good solutions.
It seems clear to me that the industry is moving rapidly toward managed
NAND. I could be wrong about that, but I don't think I am. It's pretty
hard to win by betting against the volume hardware. If that is true,
then the winning strategy is some combination of making do with what the
industry has to offer and influencing them to fix problems.
> When you're dealing with stuff in software, if you have a bug you can
> whip the developers harder. When something goes wrong inside the
> device's internal firmware, there really isn't much you can do about it
> at all.
As an individual, that is true; you have almost no leverage over the
device vendor. But as a volume customer, you do have leverage. Sun,
even in its early days of modest volumes, was able to get bug fixes for
disk drive and tape drive firmware problems.
On the other side, it's not entirely clear how you "whip harder" FOSS
developers in general. It appears to me that it's a hit-or-miss
proposition as to whether you can get sufficient attention from a given
expert. For example, consider yourself. When you worked for RH, OLPC
could get lots of your valuable attention because of the OLPC/RH
connection. But now that you are associated with Intel, what is the
situation? (Perhaps we could in fact get some of your cycles; I'm just
saying that the answer doesn't seem obvious and straightforward.)
In summary, it looks to me like there are valid arguments on both sides
of this question.
More information about the Devel