#1905 BLOC Trial-2: Field Return: flash corruption - OpenFirmware complaining of 'unknown node type 2006'.
Zarro Boogs per Child
bugtracker at laptop.org
Fri Jul 6 11:46:31 EDT 2007
#1905: Field Return: flash corruption - OpenFirmware complaining of 'unknown node
type 2006'.
-----------------------+----------------------------------------------------
Reporter: dwmw2 | Owner: wad
Type: defect | Status: assigned
Priority: blocker | Milestone: Trial-2
Component: hardware | Version:
Resolution: | Keywords:
Verified: 0 |
-----------------------+----------------------------------------------------
Changes (by kimquirk):
* summary: flash corruption - OpenFirmware complaining of 'unknown node
type 2006'. => Field Return: flash corruption -
OpenFirmware complaining of 'unknown node type
2006'.
Old description:
> A B2 machine was handed to me which failed to boot from NAND, with
> OpenFirmware complaining of 'unknown node type 2006'.
>
> This is a somewhat bogus diagnostic message from OpenFirmware. It _does_
> understand the node type 0x2006, which is a summary node. It's just that
> these ones have bad CRCs.
>
> There seems to have been corruption on the write path, between CPU, RAM
> and CAFÉ. An example...
>
> {{{
> 01fdb310 30 17 eb 15 21 00 00 3b 85 19 01 e0 36 00 00 00
> |0...!..;....6...|
> 01fdb320 a4 e1 55 df 60 05 00 80 3d 06 00 00 3f 06 00 00
> |..U.`...=...?...|
> 01fdb330 a1 a5 0a 46 0e 08 00 00 06 7e be ae 18 18 99 b3
> |...F.....~......|
> 01fdb340 70 69 6e 6b 5f 72 6f 75 6e 64 2e 67 69 66 ff ff
> |pink_round.gif..|
> }}}
>
> This provokes the following report from the kernel:
> JFFS2 notice: (2554) read_direntry: header CRC failed on dirent node at
> 0x1fdb318: read 0xaebe7e06, calculated 0x1432b5ee
>
> The parent inode value of 0x8000560 looks very suspicious. Flipping the
> msb of the byte at 01fdb327 back to a more reasonable 0x00 makes the
> crc32 match what's on the flash.
>
> There are no ECC errors reported -- what's on the flash seems to be what
> reached the CAFÉ in the DMA transfer when this block was being written.
> So this doesn't seem to be an error between CAFÉ and NAND. And the crc32
> seems sane too, so it doesn't seem likely that it's memory corruption or
> program error. I suspect hardware.
>
> I'll look at other nodes (there are many broken ones) and see if there's
> a pattern to the corruption.
New description:
SHF70500388
A B2 machine was handed to me which failed to boot from NAND, with
OpenFirmware complaining of 'unknown node type 2006'.
This is a somewhat bogus diagnostic message from OpenFirmware. It _does_
understand the node type 0x2006, which is a summary node. It's just that
these ones have bad CRCs.
There seems to have been corruption on the write path, between CPU, RAM
and CAFÉ. An example...
{{{
01fdb310 30 17 eb 15 21 00 00 3b 85 19 01 e0 36 00 00 00
|0...!..;....6...|
01fdb320 a4 e1 55 df 60 05 00 80 3d 06 00 00 3f 06 00 00
|..U.`...=...?...|
01fdb330 a1 a5 0a 46 0e 08 00 00 06 7e be ae 18 18 99 b3
|...F.....~......|
01fdb340 70 69 6e 6b 5f 72 6f 75 6e 64 2e 67 69 66 ff ff
|pink_round.gif..|
}}}
This provokes the following report from the kernel:
JFFS2 notice: (2554) read_direntry: header CRC failed on dirent node at
0x1fdb318: read 0xaebe7e06, calculated 0x1432b5ee
The parent inode value of 0x8000560 looks very suspicious. Flipping the
msb of the byte at 01fdb327 back to a more reasonable 0x00 makes the crc32
match what's on the flash.
There are no ECC errors reported -- what's on the flash seems to be what
reached the CAFÉ in the DMA transfer when this block was being written. So
this doesn't seem to be an error between CAFÉ and NAND. And the crc32
seems sane too, so it doesn't seem likely that it's memory corruption or
program error. I suspect hardware.
I'll look at other nodes (there are many broken ones) and see if there's a
pattern to the corruption.
Comment:
adding 'field return' to name
--
Ticket URL: <http://dev.laptop.org/ticket/1905#comment:18>
One Laptop Per Child <http://laptop.org/>
More information about the Bugs
mailing list