[Trac #112] Board ..0958 Failure

Zarro Boogs per Child bugtracker at laptop.org
Sat Oct 7 19:12:01 EDT 2006


#112: Board ..0958 Failure
-------------------------------+--------------------------------------------
 Reporter:  wmb at firmworks.com  |        Owner:  mfoster
     Type:  defect             |       Status:  new    
 Priority:  normal             |    Milestone:         
Component:  hardware           |   Resolution:         
 Keywords:                     |  
-------------------------------+--------------------------------------------
Old description:

> Hi, Mitch!
>
> This sounds similar to a board I found which had been damaged in
> shipment.  One of the inductors had broken, and the USB port was flaky as
> heck.  Please ship the board back to Quanta, and we'll find out exactly
> what went wrong with the hardware.
>
> Thanks!
> Mark

New description:

 I'm opening this ticket to record information about a particular failed
 board that was sent to me for analysis. Identification: The small bar code
 tag on the bottom says 0017C4000958 Failure description: The board had a
 sticky note that says "Failed 9/6/06". There was no indication of the
 system configuration or the software being used when the "failure" was
 detected. So it's difficult to know what "failure" means. Jim says that
 the people who were using the board were involved in wireless testing, so
 perhaps the wireless hardware is the problem. That certainly agrees with
 the results below. Configuration: The board had two wireless antennae
 installed. The SPI FLASH contains Insyde BIOS. Initial tests: a) Attach
 CRT and serial port and power on - Inysde BIOS screen comes on. b) Try to
 boot my diags off USB key - VSA USB enumeration does not report the
 existence of the USB key, either directly connected or via a powered USB
 2.0 hub c) Connect a USB keyboard (directly, no hub) and get control of
 BIOS via F1. Change boot order so USB HD is first (instead of floppy
 first). d) Disconnect keyboard, insert USB key. VSA USB enumeration still
 does not see it. e) Plug in ROM emulator in PLCC socket, loaded with my LB
 + diagnostic payload. Gets about halfway through LB startup then dies
 after "Call real_mode_switch_call_vsm" f) Another try dies shortly
 thereafter, saying "PCI: Sanity check failed" g) Suspecting memory
 flakiness, I apply the RAM timing patch described in trac #108. (The board
 has Hynix RAMs). h) Now my Open Firmware diag boots just fine, turns on
 the video screen, etc. i) I use it to boot Linux (build from yesterday)
 from USB key. vmlinuz and initrd load just fine, Linux starts running. But
 then I get a bunch of messages from the USB stack : hub 1-0:1.0: connect-
 debounce failed, port 4 disabled The messages never stop, they just keep
 coming out every second or so. So something is funny in USB land. j) I go
 back to my Open Firmware diagnostic and look at the USB device tree. Aha!
 The Marvell wireless device is not showing up in the tree. Next step
 (after getting some sleep): Figure out what's wrong with the Marvell
 wireless. Next day: Step-by-step execution of the USB probe sequence for
 the port connected to the Marvell Wireless device. It initially identifies
 as a high-speed device (signified by the value in the portsc@ register
 after the "reset" bit is written. The low bits are 0x3, indicating port
 status change (2) and device present (1). Then I clear the 2 it by writing
 to it, and about 8 milliseconds later, the 2 bit comes back on (another
 port status change) and the 1 bit goes off (device not present). This
 behavior is repeatable - the device "goes away" as far as the USB bus is
 concerned (but the USB protocol is gnarly so just about anything could
 make a device disappear). That result is consistent with the repeated
 "connect debounce" messages from Linux. Connected a scope to the USBP4 +/-
 lines (first did it on a good board to see what the waveform is supposed
 to look like). Sure enough, the bad board has much different waveforms
 from the good board. On the bad board, as soon as 1 is written to the EHCI
 CFGFLAG register, the USBP4+ line starts pulsing high 0V to 4.25V. The
 pulse width is 3.5 ms, the repetition period 86 ms. Nothing even remotely
 like that pattern occurs on the good board, and it doesn't correspond to
 any USB protocol characteristics of which I'm aware. I wonder if that line
 is shorted to something else? Another possibility is that the 88W8388 chip
 is insane. It's a System-On-Chip device with its own internal processor,
 so it could be doing arbitrarily strange things. Perhaps its microcode
 (which is loaded from a separate serial FLASH) got corrupted. Lacking
 development/debug tools for that device, discovering what is happening
 could be very difficult.

Comment (by mfoster):

 Replying to [comment:1 wmb at firmworks.com]:
 > No progress this week; not sure when if ever this will bubble up to the
 top of the stack...

-- 
Ticket URL: <http://dev.laptop.org/ticket/112#comment:3>
One Laptop Per Child <http://laptop.org/>



More information about the Devel mailing list