#11357 HIGH 1.75-so: Boot freezing on third clock dot

Thu Oct 27 11:13:25 EDT 2011

#11357: Boot freezing on third clock dot
-----------------------------------+----------------------------------------
           Reporter:  tonyforster  |       Owner:  saadia                           
               Type:  defect       |      Status:  new                              
           Priority:  high         |   Milestone:  1.75-software                    
          Component:  kernel       |     Version:  Development build as of this date
         Resolution:               |    Keywords:                                   
        Next_action:  diagnose     |    Verified:  0                                
Deployment_affected:               |   Blockedby:                                   
           Blocking:               |  
-----------------------------------+----------------------------------------

Comment(by dsd):

 Saadia tested and found that lockdep is unhappy indeed (even on successful
 boots):

 {{{
 [  714.756861] =================================
 [  714.756866] [ INFO: inconsistent lock state ]
 [  714.766984] 3.0.0-00173-gb67f6bf-dirty #671
 [  714.771134] ---------------------------------
 [  714.771134] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 [  714.781424] swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
 [  714.781424]  (&(&k->k_lock)->rlock){?.+...}, at: [<c036d950>]
 klist_next+0x18/0xb4
 [  714.786350] {HARDIRQ-ON-W} state was registered at:
 [  714.793913]   [<c006ef6c>] __lock_acquire+0x60c/0x1724
 [  714.803862]   [<c00704f0>] lock_acquire+0x60/0x74
 [  714.803862]   [<c0377eb8>] _raw_spin_lock+0x40/0x50
 [  714.808531]   [<c036d950>] klist_next+0x18/0xb4
 [  714.813392]   [<c01c2da0>] bus_for_each_dev+0x58/0x84
 [  714.817897]   [<c01c3530>] bus_add_driver+0xbc/0x250
 [  714.822919]   [<c01c43f8>] driver_register+0xa8/0x138
 [  714.832878]   [<c002b3dc>] do_one_initcall+0x90/0x164
 [  714.837911]   [<c00089a8>] kernel_init+0x74/0x110
 [  714.842588]   [<c0031cb8>] kernel_thread_exit+0x0/0x8
 [  714.847613] irq event stamp: 113584
 [  714.847613] hardirqs last  enabled at (113581): [<c0031d18>]
 default_idle+0x24/0x2c
 [  714.851067] hardirqs last disabled at (113582): [<c00308b4>]
 __irq_svc+0x34/0xac
 [  714.858684] softirqs last  enabled at (113584): [<c004b1f0>]
 irq_enter+0x44/0x70
 [  714.866033] softirqs last disabled at (113583): [<c004b1e4>]
 irq_enter+0x38/0x70
 [  714.880739]
 [  714.880739] other info that might help us debug this:
 [  714.887218]  Possible unsafe locking scenario:
 [  714.887218]
 [  714.887225]        CPU0
 [  714.893092]        ----
 [  714.895513]   lock(&(&k->k_lock)->rlock);
 [  714.897935]   <Interrupt>
 [  714.901923]     lock(&(&k->k_lock)->rlock);
 [  714.904516]
 [  714.908682]  *** DEADLOCK ***
 [  714.908682]
 [  714.914554] no locks held by swapper/0.
 [  714.918361]
 [  714.918361] stack backtrace:
 [  714.918365] [<c003655c>] (unwind_backtrace+0x0/0x11c) from [<c0372edc>]
 (print_usage_bug.part.28+0x208/0x264)
 [  714.922709] [<c0372edc>] (print_usage_bug.part.28+0x208/0x264) from
 [<c006e770>] (mark_lock+0x418/0x608)
 [  714.932564] [<c006e770>] (mark_lock+0x418/0x608) from [<c006eee8>]
 (__lock_acquire+0x588/0x1724)
 [  714.941986] [<c006eee8>] (__lock_acquire+0x588/0x1724) from
 [<c00704f0>] (lock_acquire+0x60/0x74)
 [  714.950714] [<c00704f0>] (lock_acquire+0x60/0x74) from [<c0377eb8>]
 (_raw_spin_lock+0x40/0x50)
 [  714.959530] [<c0377eb8>] (_raw_spin_lock+0x40/0x50) from [<c036d950>]
 (klist_next+0x18/0xb4)
 [  714.968087] [<c036d950>] (klist_next+0x18/0xb4) from [<c01c485c>]
 (class_dev_iter_next+0x10/0x40)
 [  714.976478] [<c01c485c>] (class_dev_iter_next+0x10/0x40) from
 [<c01c4e4c>] (class_find_device+0x8c/0xb4)
 [  714.994731] [<c01c4e4c>] (class_find_device+0x8c/0xb4) from
 [<c02426e0>] (power_supply_get_by_name+0x1c/0x34)
 [  715.004593] [<c02426e0>] (power_supply_get_by_name+0x1c/0x34) from
 [<c01cfd6c>] (olpc_ec_1_75_irq+0x254/0x388)
 [  715.004593] [<c01cfd6c>] (olpc_ec_1_75_irq+0x254/0x388) from
 [<c008a548>] (handle_irq_event_percpu+0x30/0x174)
 [  715.014541] [<c008a548>] (handle_irq_event_percpu+0x30/0x174) from
 [<c008a6c8>] (handle_irq_event+0x3c/0x5c)
 [  715.024482] [<c008a6c8>] (handle_irq_event+0x3c/0x5c) from [<c008c2b0>]
 (handle_level_irq+0xb8/0xe8)
 [  715.043331] [<c008c2b0>] (handle_level_irq+0xb8/0xe8) from [<c008a100>]
 (generic_handle_irq+0x20/0x30)
 [  715.043331] [<c008a100>] (generic_handle_irq+0x20/0x30) from
 [<c002b060>] (asm_do_IRQ+0x60/0x84)
 [  715.052578] [<c002b060>] (asm_do_IRQ+0x60/0x84) from [<c00308e0>]
 (__irq_svc+0x60/0xac)
 [  715.061311] Exception stack(0xc04c3f80 to 0xc04c3fc8)
 [  715.069256] 3f80: 00000001 00000004 c04c3fb0 c0031cf4 c04c2000 c04c869c
 c04f30c4 c04c8694
 [  715.074277] 3fa0: 00004059 560f5815 00000000 00000000 00000000 c04c3fc8
 c0070ccc c0031e64
 [  715.082400] 3fc0: 20000013 ffffffff
 [  715.090518] [<c00308e0>] (__irq_svc+0x60/0xac) from [<c0031e64>]
 (cpu_idle+0x50/0xac)
 [  715.101768] [<c0031e64>] (cpu_idle+0x50/0xac) from [<c00088e0>]
 (start_kernel+0x29c/0x2f0)
 [  715.101768] [<c00088e0>] (start_kernel+0x29c/0x2f0) from [<0000803c>]
 (0x803c)
 }}}

 power_supply_get_by_name() uses klists which take a lock with spin_lock()
 (i.e. takes a lock without disabling IRQs - so in any IRQ handler you may
 be operating with that lock already hold). Therefore it seems unsafe to
 use anything klist-related from IRQ context. Also clarified in
 http://kerneltrap.org/mailarchive/linux-kernel/2010/4/20/4560708

-- 
Ticket URL: <http://dev.laptop.org/ticket/11357#comment:9>
One Laptop Per Child <http://laptop.org/>
OLPC bug tracking system