[Trac #112] Board ..0958 Failure

Zarro Boogs per Child bugtracker at laptop.org
Sat Oct 7 19:09:53 EDT 2006


#112: Board ..0958 Failure
-------------------------------+--------------------------------------------
 Reporter:  wmb at firmworks.com  |        Owner:  mfoster
     Type:  defect             |       Status:  new    
 Priority:  normal             |    Milestone:         
Component:  hardware           |   Resolution:         
 Keywords:                     |  
-------------------------------+--------------------------------------------
Old description:

> I'm opening this ticket to record information about a particular failed
> board that was sent to me for analysis.
>
> Identification:  The small bar code tag on the bottom says 0017C4000958
>
> Failure description: The board had a sticky note that says "Failed
> 9/6/06".  There was no indication of the system configuration or the
> software being used when the "failure" was detected.  So it's difficult
> to know what "failure" means.  Jim says that the people who were using
> the board were involved in wireless testing, so perhaps the wireless
> hardware is the problem.  That certainly agrees with the results below.
>
> Configuration: The board had two wireless antennae installed.  The SPI
> FLASH contains Insyde BIOS.
>
> Initial tests:
> a) Attach CRT and serial port and power on - Inysde BIOS screen comes on.
>
> b) Try to boot my diags off USB key - VSA USB enumeration does not report
> the existence of the USB key, either directly connected or via a powered
> USB 2.0 hub
>
> c) Connect a USB keyboard (directly, no hub) and get control of BIOS via
> F1.  Change boot order so USB HD is first (instead of floppy first).
>
> d) Disconnect keyboard, insert USB key.  VSA USB enumeration still does
> not see it.
>
> e) Plug in ROM emulator in PLCC socket, loaded with my LB + diagnostic
> payload.  Gets about halfway through LB startup then dies after "Call
> real_mode_switch_call_vsm"
>
> f) Another try dies shortly thereafter, saying "PCI: Sanity check failed"
>
> g) Suspecting memory flakiness, I apply the RAM timing patch described in
> trac #108.  (The board has Hynix RAMs).
>
> h) Now my Open Firmware diag boots just fine, turns on the video screen,
> etc.
>
> i) I use it to boot Linux (build from yesterday) from USB key.  vmlinuz
> and initrd load just fine, Linux starts running.  But then I get a bunch
> of messages from the USB stack :  hub 1-0:1.0: connect-debounce failed,
> port 4 disabled
> The messages never stop, they just keep coming out every second or so.
>
> So something is funny in USB land.
>
> j) I go back to my Open Firmware diagnostic and look at the USB device
> tree.  Aha!  The Marvell wireless device is not showing up in the tree.
>
> Next step (after getting some sleep): Figure out what's wrong with the
> Marvell wireless.
>
> Next day:
>
> Step-by-step execution of the USB probe sequence for the port connected
> to the Marvell Wireless device.  It initially identifies as a high-speed
> device (signified by the value in the portsc@ register after the "reset"
> bit is written.  The low bits are 0x3, indicating port status change (2)
> and device present (1).  Then I clear the 2 it by writing to it, and
> about 8 milliseconds later, the 2 bit comes back on (another port status
> change) and the 1 bit goes off (device not present).
>
> This behavior is repeatable - the device "goes away" as far as the USB
> bus is concerned (but the USB protocol is gnarly so just about anything
> could make a device disappear).  That result is consistent with the
> repeated "connect debounce" messages from Linux.
>
> Connected a scope to the USBP4 +/- lines (first did it on a good board to
> see what the waveform is supposed to look like).  Sure enough, the bad
> board has much different waveforms from the good board.  On the bad
> board, as soon as 1 is written to the EHCI CFGFLAG register, the USBP4+
> line starts pulsing high 0V to 4.25V.  The pulse width is 3.5 ms, the
> repetition period 86 ms.  Nothing even remotely like that pattern occurs
> on the good board, and it doesn't correspond to any USB protocol
> characteristics of which I'm aware.  I wonder if that line is shorted to
> something else?
>
> Another possibility is that the 88W8388 chip is insane.  It's a System-
> On-Chip device with its own internal processor, so it could be doing
> arbitrarily strange things.  Perhaps its microcode (which is loaded from
> a separate serial FLASH) got corrupted.  Lacking development/debug tools
> for that device, discovering what is happening could be very difficult.

New description:

 Hi, Mitch!

 This sounds similar to a board I found which had been damaged in shipment.
 One of the inductors had broken, and the USB port was flaky as heck.
 Please ship the board back to Quanta, and we'll find out exactly what went
 wrong with the hardware.

 Thanks!
 Mark

Comment (by mfoster):

 Replying to [ticket:112 wmb at firmworks.com]:
 > I'm opening this ticket to record information about a particular failed
 board that was sent to me for analysis.
 >
 > Identification:  The small bar code tag on the bottom says 0017C4000958
 >
 > Failure description: The board had a sticky note that says "Failed
 9/6/06".  There was no indication of the system configuration or the
 software being used when the "failure" was detected.  So it's difficult to
 know what "failure" means.  Jim says that the people who were using the
 board were involved in wireless testing, so perhaps the wireless hardware
 is the problem.  That certainly agrees with the results below.
 >
 > Configuration: The board had two wireless antennae installed.  The SPI
 FLASH contains Insyde BIOS.
 >
 > Initial tests:
 > a) Attach CRT and serial port and power on - Inysde BIOS screen comes
 on.
 >
 > b) Try to boot my diags off USB key - VSA USB enumeration does not
 report the existence of the USB key, either directly connected or via a
 powered USB 2.0 hub
 >
 > c) Connect a USB keyboard (directly, no hub) and get control of BIOS via
 F1.  Change boot order so USB HD is first (instead of floppy first).
 >
 > d) Disconnect keyboard, insert USB key.  VSA USB enumeration still does
 not see it.
 >
 > e) Plug in ROM emulator in PLCC socket, loaded with my LB + diagnostic
 payload.  Gets about halfway through LB startup then dies after "Call
 real_mode_switch_call_vsm"
 >
 > f) Another try dies shortly thereafter, saying "PCI: Sanity check
 failed"
 >
 > g) Suspecting memory flakiness, I apply the RAM timing patch described
 in trac #108.  (The board has Hynix RAMs).
 >
 > h) Now my Open Firmware diag boots just fine, turns on the video screen,
 etc.
 >
 > i) I use it to boot Linux (build from yesterday) from USB key.  vmlinuz
 and initrd load just fine, Linux starts running.  But then I get a bunch
 of messages from the USB stack :  hub 1-0:1.0: connect-debounce failed,
 port 4 disabled
 > The messages never stop, they just keep coming out every second or so.
 >
 > So something is funny in USB land.
 >
 > j) I go back to my Open Firmware diagnostic and look at the USB device
 tree.  Aha!  The Marvell wireless device is not showing up in the tree.
 >
 > Next step (after getting some sleep): Figure out what's wrong with the
 Marvell wireless.
 >
 > Next day:
 >
 > Step-by-step execution of the USB probe sequence for the port connected
 to the Marvell Wireless device.  It initially identifies as a high-speed
 device (signified by the value in the portsc@ register after the "reset"
 bit is written.  The low bits are 0x3, indicating port status change (2)
 and device present (1).  Then I clear the 2 it by writing to it, and about
 8 milliseconds later, the 2 bit comes back on (another port status change)
 and the 1 bit goes off (device not present).
 >
 > This behavior is repeatable - the device "goes away" as far as the USB
 bus is concerned (but the USB protocol is gnarly so just about anything
 could make a device disappear).  That result is consistent with the
 repeated "connect debounce" messages from Linux.
 >
 > Connected a scope to the USBP4 +/- lines (first did it on a good board
 to see what the waveform is supposed to look like).  Sure enough, the bad
 board has much different waveforms from the good board.  On the bad board,
 as soon as 1 is written to the EHCI CFGFLAG register, the USBP4+ line
 starts pulsing high 0V to 4.25V.  The pulse width is 3.5 ms, the
 repetition period 86 ms.  Nothing even remotely like that pattern occurs
 on the good board, and it doesn't correspond to any USB protocol
 characteristics of which I'm aware.  I wonder if that line is shorted to
 something else?
 >
 > Another possibility is that the 88W8388 chip is insane.  It's a System-
 On-Chip device with its own internal processor, so it could be doing
 arbitrarily strange things.  Perhaps its microcode (which is loaded from a
 separate serial FLASH) got corrupted.  Lacking development/debug tools for
 that device, discovering what is happening could be very difficult.

-- 
Ticket URL: <http://dev.laptop.org/ticket/112#comment:2>
One Laptop Per Child <http://laptop.org/>



More information about the Devel mailing list