#6578 NORM Never A: Cafe NAND: Corrected 1 symbol errors - error recovery should be improved
Zarro Boogs per Child
bugtracker at laptop.org
Wed Feb 27 15:38:35 EST 2008
#6578: Cafe NAND: Corrected 1 symbol errors - error recovery should be improved
--------------------+-------------------------------------------------------
Reporter: gnu | Owner: jg
Type: defect | Status: new
Priority: normal | Milestone: Never Assigned
Component: distro | Version:
Keywords: | Verified: 0
Blocking: | Blockedby:
--------------------+-------------------------------------------------------
I received these kernel messages on my MP G1G1 laptop, running
update.1-691, today:
[ 124.794431] CAF<89> NAND 0000:00:0c.0: Corrected 1 symbol errors
[ 124.838537] CAF<89> NAND 0000:00:0c.0: Corrected 1 symbol errors
[ 134.750821] msh0: no IPv6 routers present
[ 146.301007] ADDRCONF(NETDEV_CHANGE): msh0: link becomes ready
[ 157.140197] ADDRCONF(NETDEV_CHANGE): msh0: link becomes ready
[ 162.128785] eth0: no IPv6 routers present
[ 167.780302] CAF<89> NAND 0000:00:0c.0: Corrected 1 symbol errors
[ 167.961249] msh0: no IPv6 routers present
[ 168.741263] CAF<89> NAND 0000:00:0c.0: Corrected 1 symbol errors
It reports a PCI device number (0000:00:0c.0).
It does not report anything further about the error -- not the raw block
or chip address, not the symbol in error, not the high-level inode or
filename involved. This makes it hard to diagnose or even recognize
patterns.
It does not apparently push the error information up a level into the
filesystem, so the filesystem is unable to store a corrected copy of the
file data elsewhere (in case a second error arises in this block, making
it uncorrectable). Thus, the filesystem is apparently rereading this
block several times (producing this message each time). Of course, I
can't tell if it's rereading, or if it is encountering errors in several
different blocks, since it doesn't tell me which block.
There was info at boot time about a "bad block table":
[ 26.816456] NAND device: Manufacturer ID: 0xad, Chip ID: 0xdc (Hynix
NAND 512
MiB 3,3V 8-bit)
[ 26.850413] 2 NAND chips detected
[ 26.878771] Bad block table found at page 524224, version 0x01
[ 26.879000] Bad block table found at page 524160, version 0x01
[ 26.879154] nand_read_bbt: Bad block at 0x038a0000
[ 26.879172] nand_read_bbt: Bad block at 0x038c0000
[ 26.879206] nand_read_bbt: Bad block at 0x05bc0000
[ 26.879266] nand_read_bbt: Bad block at 0x0b8a0000
[ 26.879284] nand_read_bbt: Bad block at 0x0b8c0000
[ 26.879754] Searching for RedBoot partition table in NAND 512MiB 3,3V
8-bit a
t offset 0xfd80000
[ 26.920028] No RedBoot partition table detected in NAND 512MiB 3,3V
8-bit
It is not clear whether single-symbol errors like this will cause a block
of NAND to be added to the "bad block table". I suggest that they be
added to a "provisional bad block table", with a count of errors
encountered. If such a block is rewritten with new data, and continues to
produce errors, it should go into the bad block table and no longer be
used.
--
Ticket URL: <http://dev.laptop.org/ticket/6578>
One Laptop Per Child <http://dev.laptop.org>
OLPC bug tracking system
More information about the Bugs
mailing list