#12101 BLOC 4-softw: cl4: touchpad missing after reboot
Zarro Boogs per Child
bugtracker at laptop.org
Fri Feb 1 16:44:34 EST 2013
#12101: cl4: touchpad missing after reboot
---------------------------------+------------------------------------------
Reporter: pgf | Owner:
Type: defect | Status: new
Priority: blocker | Milestone: 4-software
Component: kernel | Version: Development source as of this date
Resolution: | Keywords: XO-4, touchpad
Next_action: reproduce | Verified: 0
Deployment_affected: | Blockedby:
Blocking: |
---------------------------------+------------------------------------------
Comment(by dsd):
Reproduced this on a cold boot. I have a feeling that this bug appears
quite often on a cold boot and not so much in a reboot loop.
There was no weird IRQ169 handler.
For an unknown reason we couldn't get anything except junk from the EC
serial port, and didn't want to reboot. So we used sdkit. Thanks to Paul
for his help looking at the details.
After a while the keyboard lag issue came into play as well (#12370). So
it is not exactly clear if we are debugging the failure to communicate
with the mouse, or the keyboard lag (also seen on XO-1.75, #11543), or
both.
{{{
ok d4290000 1000 mmap constant sp-base
ok sp-base 84 + l@ .
}}}
The SP code that runs on the other end of the ap-sp interface is in cforth
git at src/platform/arm-xo-1.75/consoleio.c
Looking at register values on this confused system ("bad") vs a working
unit ("good")
{{{
SECURE_PROCESSOR_COMMAND 40 = 1f3 (bad), 00ff (good)
COMMAND_RETURN_STATUS 80 = e0 (bad), 1c (good)
COMMAND_FIFO_STATUS c4 = 104 (bad), 100 (good)
PJ_RST_INTERRUPT c8 = 20000 (bad), 0 (good)
PJ_INTERRUPT_MASK cc = ~1 (both)
SP_INTERRUPT_REGISTER 218 = 802 (bad), 804 (good)
SP_INTERRUPT_MASK 21c = ~1 (bad), 0xfffffffd (good)
SP_CONTROL 220 = 1 (bad), 0 (good)
PJ_INTERRUPT_SET 234 = 20000 (bad), 0 (good)
}}}
The COMMAND_FIFO_STATUS value shows that on a bad system, there are 4
commands queued in the FIFO, which is the maximum allowed amount on MMP2
and MMP3 (according to datasheet). We found a bug in olpc_keyboard where
it will continue writing commands even when there are 4 commands queued,
fixed in da81b98b1e17.
SP_INTERRUPT_MASK shows that the command interrupt has been masked. This
explains why the queue isn't being emptied - the SP isn't listening.
In the SP code you can see that irq_handler() does mask the interrupt, and
applies some non-trivial logic for when to unmask it. It looks like we
have triggered a bug in this code, or confused it significantly with bad
data.
There is a possibility the olpc_keyboard bug has caused the SP to be
confused, but we haven't come up with a solid explanation that would back
up this possibility, so for now we will go back to testing with the fix
and look at logging when the FIFO is full.
Other open questions:
* irq_handler() in the SP only processes one command per interrupt. Is
that correct? Would be interesting to run an experiment: SP masks the
interrupt, CPU queues two commands, SP unmasks interrupt, what happens
now? How many interrupts?
* Can olpc_kbd_pad_write() be called twice concurrently or is there some
serialization happening at a higher layer? This code would be racy if ran
concurrently in two different processes.
--
Ticket URL: <http://dev.laptop.org/ticket/12101#comment:16>
One Laptop Per Child <http://laptop.org/>
OLPC bug tracking system
More information about the Bugs
mailing list