#11982 BLOC 12.1.0: XO-1.5 os16 runin hang
Zarro Boogs per Child
bugtracker at laptop.org
Wed Jul 25 17:36:07 EDT 2012
#11982: XO-1.5 os16 runin hang
--------------------------------+-------------------------------------------
Reporter: Quozl | Owner: dsd
Type: defect | Status: assigned
Priority: blocker | Milestone: 12.1.0
Component: kernel | Version: Development build as of this date
Resolution: | Keywords:
Next_action: diagnose | Verified: 0
Deployment_affected: | Blockedby:
Blocking: |
--------------------------------+-------------------------------------------
Comment(by dsd):
Todays update:
My overnight testing confirms that running the 3.1 kernel and runin
version from 12.1.0 on top of 11.3.1 does not reproduce the issue. So it
seems fair to say that this hang is either triggered by a userspace
application, or is a kernel bug that is only now exposed due to a change
in userspace.
We found another case of the hang occurring in read_unlock(&tasklist_lock)
but this one also printed "dcon_source_switch to CPU" before the call to
read_unlock() had returned (it never did). This suggests that the problem
is not actually in the unlocking, its in some other task that is being
scheduled at that point due to kernel pre-emption being re-enabled when
unlocking a spinlock.
Disabling kernel preemption helped to confirm this - the hang was then
postponed until thaw_processes() calls schedule() at the end - and
schedule() never returned. This was confirmed on 2 systems.
(The DCON is not really a suspect, because in the cases where kernel
preemption was disabled, the DCON was fully unfrozen before tasks were
restarted)
Sam has seen two instances of a hang that occur at a slightly later stage,
after all tasks have been resumed and even after the wifi card has been
re-detected. So perhaps we don't always see this immediately during
resume.
We are currently testing CONFIG_HARDLOCKUP_DETECTOR (early indications
suggest that this doesn't catch anything) and a modified resume routine
where debug info is printed from !__schedule(), to hopefully tell us which
process is scheduled immediately before the hang.
--
Ticket URL: <http://dev.laptop.org/ticket/11982#comment:22>
One Laptop Per Child <http://laptop.org/>
OLPC bug tracking system
More information about the Bugs
mailing list