[Trac #112] Board ..0958 Failure
Zarro Boogs per Child
bugtracker at laptop.org
Sat Oct 7 19:12:01 EDT 2006
#112: Board ..0958 Failure
-------------------------------+--------------------------------------------
Reporter: wmb at firmworks.com | Owner: mfoster
Type: defect | Status: new
Priority: normal | Milestone:
Component: hardware | Resolution:
Keywords: |
-------------------------------+--------------------------------------------
Old description:
> Hi, Mitch!
>
> This sounds similar to a board I found which had been damaged in
> shipment. One of the inductors had broken, and the USB port was flaky as
> heck. Please ship the board back to Quanta, and we'll find out exactly
> what went wrong with the hardware.
>
> Thanks!
> Mark
New description:
I'm opening this ticket to record information about a particular failed
board that was sent to me for analysis. Identification: The small bar code
tag on the bottom says 0017C4000958 Failure description: The board had a
sticky note that says "Failed 9/6/06". There was no indication of the
system configuration or the software being used when the "failure" was
detected. So it's difficult to know what "failure" means. Jim says that
the people who were using the board were involved in wireless testing, so
perhaps the wireless hardware is the problem. That certainly agrees with
the results below. Configuration: The board had two wireless antennae
installed. The SPI FLASH contains Insyde BIOS. Initial tests: a) Attach
CRT and serial port and power on - Inysde BIOS screen comes on. b) Try to
boot my diags off USB key - VSA USB enumeration does not report the
existence of the USB key, either directly connected or via a powered USB
2.0 hub c) Connect a USB keyboard (directly, no hub) and get control of
BIOS via F1. Change boot order so USB HD is first (instead of floppy
first). d) Disconnect keyboard, insert USB key. VSA USB enumeration still
does not see it. e) Plug in ROM emulator in PLCC socket, loaded with my LB
+ diagnostic payload. Gets about halfway through LB startup then dies
after "Call real_mode_switch_call_vsm" f) Another try dies shortly
thereafter, saying "PCI: Sanity check failed" g) Suspecting memory
flakiness, I apply the RAM timing patch described in trac #108. (The board
has Hynix RAMs). h) Now my Open Firmware diag boots just fine, turns on
the video screen, etc. i) I use it to boot Linux (build from yesterday)
from USB key. vmlinuz and initrd load just fine, Linux starts running. But
then I get a bunch of messages from the USB stack : hub 1-0:1.0: connect-
debounce failed, port 4 disabled The messages never stop, they just keep
coming out every second or so. So something is funny in USB land. j) I go
back to my Open Firmware diagnostic and look at the USB device tree. Aha!
The Marvell wireless device is not showing up in the tree. Next step
(after getting some sleep): Figure out what's wrong with the Marvell
wireless. Next day: Step-by-step execution of the USB probe sequence for
the port connected to the Marvell Wireless device. It initially identifies
as a high-speed device (signified by the value in the portsc@ register
after the "reset" bit is written. The low bits are 0x3, indicating port
status change (2) and device present (1). Then I clear the 2 it by writing
to it, and about 8 milliseconds later, the 2 bit comes back on (another
port status change) and the 1 bit goes off (device not present). This
behavior is repeatable - the device "goes away" as far as the USB bus is
concerned (but the USB protocol is gnarly so just about anything could
make a device disappear). That result is consistent with the repeated
"connect debounce" messages from Linux. Connected a scope to the USBP4 +/-
lines (first did it on a good board to see what the waveform is supposed
to look like). Sure enough, the bad board has much different waveforms
from the good board. On the bad board, as soon as 1 is written to the EHCI
CFGFLAG register, the USBP4+ line starts pulsing high 0V to 4.25V. The
pulse width is 3.5 ms, the repetition period 86 ms. Nothing even remotely
like that pattern occurs on the good board, and it doesn't correspond to
any USB protocol characteristics of which I'm aware. I wonder if that line
is shorted to something else? Another possibility is that the 88W8388 chip
is insane. It's a System-On-Chip device with its own internal processor,
so it could be doing arbitrarily strange things. Perhaps its microcode
(which is loaded from a separate serial FLASH) got corrupted. Lacking
development/debug tools for that device, discovering what is happening
could be very difficult.
Comment (by mfoster):
Replying to [comment:1 wmb at firmworks.com]:
> No progress this week; not sure when if ever this will bubble up to the
top of the stack...
--
Ticket URL: <http://dev.laptop.org/ticket/112#comment:3>
One Laptop Per Child <http://laptop.org/>
More information about the Devel
mailing list