[Trac #1138] iwlist command failure.
Zarro Boogs per Child
bugtracker at laptop.org
Thu Mar 29 15:47:30 EDT 2007
#1138: iwlist command failure.
--------------------------------+-------------------------------------------
Reporter: swagle at marvell.com | Owner: jcardona
Type: defect | Status: closed
Priority: blocker | Milestone: Trial-1
Component: wireless | Resolution: fixed
Keywords: |
--------------------------------+-------------------------------------------
Comment (by jcardona):
Replying to [comment:14 dcbw]:
> I audited the reset calls in the driver. There are three types; (1) USB
port reset, (2) CMD_802_11_RESET soft-reset, and (3) EC power cut.
> (...)
Dan, I must apologize for sending you on a wild goose chase. I did jump
to a wrong conclusion on this a bit too early. Let me explain the problem
in more detail and better separate facts from speculation.
* pre 5.220.10 release
After the merge with the stable branch we encountered PHY register
corruption problems. We identified that the problem was caused by
unprotected accesses to the PHY by several tasks. Introduced semaphore to
make those accesses atomic. Once we confirmed that the register
corruption was resolved, we went ahead with the release.
* 5.220.10p1, 5.220.10p2
This bug is reported. Could reproduce only on one of the xo's in our
office (out of 10). Suspected this new semaphore, and we protected PHY
accesses by disabling interrupts instead. The new image would not freeze.
As we needed an image for the upcoming deadline, we decided to release
(5.220.10p3) without really having found the root cause for this problem.
Added JTAG connector to the xo that was failing. With the debugger we
could confirm that two tasks were blocking on the new semaphore. What we
could not know is who had taken the semaphore without returning it (that
information is not available with this RTOS). Looking through the code we
found a USB reset function call that was accessing the PHY (hence taking
the semaphore) and executing in an interrupt context.
Trying to get a semaphore from an interrupt service routine may cause
suspension. What happens next is dependent on the RTOS, but some RTOS
freeze, some return errors. The user guide just says that suspending from
within an ISR is "not allowed".
We [comment:13 concluded that we had found the cause of the problem] and
went to sleep (with some bad dreams, I must admit...).
The next day we studied the code in more detail and followed all the
execution paths that would lead to taking this semaphore. We found a
function that, depending on the value of a calibration constant stored on
EEPROM, would return without releasing the semaphore. With the debugger
we confirmed that the error condition occurred on the xo's that were
freezing and not on the ones that did not. Another team is now looking at
this problem trying to determine why the calibration value is out of
range. Is calibration data incorrect or we are dealing with a EEPROM read
error?
--
Ticket URL: <http://dev.laptop.org/ticket/1138#comment:16>
One Laptop Per Child <http://laptop.org/>
More information about the Bugs
mailing list