#9423 NORM 1.5-F11: Stabilize OFW USB handling on XO-1.5
Zarro Boogs per Child
bugtracker at laptop.org
Thu Oct 29 02:41:40 EDT 2009
#9423: Stabilize OFW USB handling on XO-1.5
-------------------------------------------+--------------------------------
Reporter: wmb at firmworks.com | Owner: wmb at firmworks.com
Type: defect | Status: assigned
Priority: normal | Milestone: 1.5-F11
Component: ofw - open firmware | Version: 1.5-B2
Resolution: | Keywords:
Next_action: test in release | Verified: 0
Deployment_affected: | Blockedby:
Blocking: |
-------------------------------------------+--------------------------------
Comment(by wmb at firmworks.com):
Most interesting.
I ran a repeated-boot-from-the-bad-stick test with a /boot/olpc.fth on
that stick containing just "bye" (after a comment line as required by
bootable .fth files). That way the system will continually reboot until
the USB stick flakes out. I was trying to find out if the recent change
to add additional power-off time changes the frequency of the USB stick
problem.
The extra time doesn't help - the stick tends to fail about 1 in 6
reboots.
But I managed to catch it in the act of failing - there is a long delay
while it times out - and interrupt the program so I can use Forth do
inspect everything. A deep analysis suggests a DMA problem. The data
result of a "read blocks" operation is present in memory, and the result
of the "read transaction status" followup is similarly in memory, but the
status bits in the transfer descriptor say that the operation is still
active. It is as if the DMA write to update the descriptor didn't hit the
memory.
This picture agrees with another datapoint I have. Some time ago I
captured some USB bus analysis traces (using the ellisys USB explorer) of
both successful and failing operations on this USB stick. In the failing
case, the USB bus trace shows a successful "read transaction status"
sequence on the USB bus, but instead of then proceeding to the next
operation (as with the good trace), the host controller "endlessly"
retries an "IN" operation for which it never gets a response from the
devices.
That is what you would expect if the descriptor update failed to "take".
The descriptor would remain in the active state, so the host controller
would keep trying to service it, repeating an "IN" request that the device
won't honor because it has already responded and the response has been
ACKed.
Why does this seem to happen only with the one brand of USB stick? Why
does it only happen at this one point in the sequence, after having
already done some other sub-transfers? Why does it then seem to work
perfectly if you let that transaction time out and restart the whole
procedure?
Could this be related to the "PCI" problems that jnettlet is seeing with
graphics acceleration ops?
--
Ticket URL: <http://dev.laptop.org/ticket/9423#comment:5>
One Laptop Per Child <http://laptop.org/>
OLPC bug tracking system
More information about the Bugs
mailing list