[Trac #1138] iwlist command failure.

Zarro Boogs per Child bugtracker at laptop.org
Thu Mar 29 15:47:30 EDT 2007


#1138: iwlist command failure.
--------------------------------+-------------------------------------------
 Reporter:  swagle at marvell.com  |        Owner:  jcardona
     Type:  defect              |       Status:  closed  
 Priority:  blocker             |    Milestone:  Trial-1 
Component:  wireless            |   Resolution:  fixed   
 Keywords:                      |  
--------------------------------+-------------------------------------------
Comment (by jcardona):

 Replying to [comment:14 dcbw]:
 > I audited the reset calls in the driver.  There are three types; (1) USB
 port reset, (2) CMD_802_11_RESET soft-reset, and (3) EC power cut.
 > (...)

 Dan, I must apologize for sending you on a wild goose chase.  I did jump
 to a wrong conclusion on this a bit too early.  Let me explain the problem
 in more detail and better separate facts from speculation.

  * pre 5.220.10 release

 After the merge with the stable branch we encountered PHY register
 corruption problems.  We identified that the problem was caused by
 unprotected accesses to the PHY by several tasks.  Introduced semaphore to
 make those accesses atomic.  Once we confirmed that the register
 corruption was resolved, we went ahead with the release.

  * 5.220.10p1, 5.220.10p2

 This bug is reported.  Could reproduce only on one of the xo's in our
 office (out of 10).  Suspected this new semaphore, and we protected PHY
 accesses by disabling interrupts instead.  The new image would not freeze.
 As we needed an image for the upcoming deadline, we decided to release
 (5.220.10p3) without really having found the root cause for this problem.

 Added JTAG connector to the xo that was failing.  With the debugger we
 could confirm that two tasks were blocking on the new semaphore.  What we
 could not know is who had taken the semaphore without returning it (that
 information is not available with this RTOS).  Looking through the code we
 found a USB reset function call that was accessing the PHY (hence taking
 the semaphore) and executing in an interrupt context.
 Trying to get a semaphore from an interrupt service routine may cause
 suspension.  What happens next is dependent on the RTOS, but some RTOS
 freeze, some return errors.  The user guide just says that suspending from
 within an ISR is "not allowed".
 We [comment:13 concluded that we had found the cause of the problem] and
 went to sleep (with some bad dreams, I must admit...).

 The next day we studied the code in more detail and followed all the
 execution paths that would lead to taking this semaphore.  We found a
 function that, depending on the value of a calibration constant stored on
 EEPROM, would return without releasing the semaphore.  With the debugger
 we confirmed that the error condition occurred on the  xo's that were
 freezing and not on the ones that did not.  Another team is now looking at
 this problem trying to determine why the calibration value is out of
 range.  Is calibration data incorrect or we are dealing with a EEPROM read
 error?

-- 
Ticket URL: <http://dev.laptop.org/ticket/1138#comment:16>
One Laptop Per Child <http://laptop.org/>



More information about the Bugs mailing list