#9423 NORM 1.5-F11: Stabilize OFW USB handling on XO-1.5

Zarro Boogs per Child bugtracker at laptop.org
Thu Oct 29 02:41:40 EDT 2009


#9423: Stabilize OFW USB handling on XO-1.5
-------------------------------------------+--------------------------------
           Reporter:  wmb at firmworks.com    |       Owner:  wmb at firmworks.com
               Type:  defect               |      Status:  assigned         
           Priority:  normal               |   Milestone:  1.5-F11          
          Component:  ofw - open firmware  |     Version:  1.5-B2           
         Resolution:                       |    Keywords:                   
        Next_action:  test in release      |    Verified:  0                
Deployment_affected:                       |   Blockedby:                   
           Blocking:                       |  
-------------------------------------------+--------------------------------

Comment(by wmb at firmworks.com):

 Most interesting.

 I ran a repeated-boot-from-the-bad-stick test with a /boot/olpc.fth on
 that stick containing just "bye" (after a comment line as required by
 bootable .fth files).  That way the system will continually reboot until
 the USB stick flakes out.  I was trying to find out if the recent change
 to add additional power-off time changes the frequency of the USB stick
 problem.

 The extra time doesn't help - the stick tends to fail about 1 in 6
 reboots.

 But I managed to catch it in the act of failing - there is a long delay
 while it times out - and interrupt the program so I can use Forth do
 inspect everything.  A deep analysis suggests a DMA problem.  The data
 result of a "read blocks" operation is present in memory, and the result
 of the "read transaction status" followup is similarly in memory, but the
 status bits in the transfer descriptor say that the operation is still
 active.  It is as if the DMA write to update the descriptor didn't hit the
 memory.

 This picture agrees with another datapoint I have.  Some time ago I
 captured some USB bus analysis traces (using the ellisys USB explorer) of
 both successful and failing operations on this USB stick.  In the failing
 case, the USB bus trace shows a successful "read transaction status"
 sequence on the USB bus, but instead of then proceeding to the next
 operation (as with the good trace), the host controller "endlessly"
 retries an "IN" operation for which it never gets a response from the
 devices.

 That is what you would expect if the descriptor update failed to "take".
 The descriptor would remain in the active state, so the host controller
 would keep trying to service it, repeating an "IN" request that the device
 won't honor because it has already responded and the response has been
 ACKed.

 Why does this seem to happen only with the one brand of USB stick?  Why
 does it only happen at this one point in the sequence, after having
 already done some other sub-transfers?  Why does it then seem to work
 perfectly if you let that transaction time out and restart the whole
 procedure?

 Could this be related to the "PCI" problems that jnettlet is seeing with
 graphics acceleration ops?

-- 
Ticket URL: <http://dev.laptop.org/ticket/9423#comment:5>
One Laptop Per Child <http://laptop.org/>
OLPC bug tracking system


More information about the Bugs mailing list