#9601 NORM Not Tri: Loss of serial Data Carrier Detect signal on suspend/resume

Zarro Boogs per Child bugtracker at laptop.org
Wed Nov 4 01:29:59 EST 2009


#9601: Loss of serial Data Carrier Detect signal on suspend/resume
--------------------------+-------------------------------------------------
 Reporter:  dsaxena       |                 Owner:  dsaxena      
     Type:  defect        |                Status:  new          
 Priority:  normal        |             Milestone:  Not Triaged  
Component:  not assigned  |               Version:  not specified
 Keywords:                |           Next_action:  diagnose     
 Verified:  0             |   Deployment_affected:               
Blockedby:                |              Blocking:  9420, 9458   
--------------------------+-------------------------------------------------
 I am working on debugging #9420 and #9458 and have determined that they
 are both symptoms of the same underlying issue: When we return from
 resume, we are loosing the serial carrier signal (DCD).

 {{{
 Nov  4 05:00:39 localhost kernel: [   99.754261] dcon_source_switch to CPU
 Nov  4 05:00:39 localhost kernel: [   99.757652] Pid: 2183, comm: bash Not
 tainted 2.6.30.1 #26
 Nov  4 05:00:39 localhost kernel: [   99.757660] Call Trace:
 Nov  4 05:00:39 localhost kernel: [   99.757684]  [<b071adf2>] ?
 printk+0xf/0x15
 Nov  4 05:00:39 localhost kernel: [   99.757703]  [<b05a7d85>]
 tty_hangup+0x21/0x33
 Nov  4 05:00:39 localhost kernel: [   99.757723]  [<b040af55>] ?
 lapic_next_event+0x16/0x1a
 Nov  4 05:00:39 localhost kernel: [   99.757742]  [<b0431bdb>] ?
 clockevents_program_event+0xba/0xc8
 Nov  4 05:00:39 localhost kernel: [   99.757765]  [<b04328f0>] ?
 tick_dev_program_event+0x34/0xa2
 Nov  4 05:00:39 localhost kernel: [   99.757787]  [<b05baf21>]
 check_modem_status+0x99/0x11e
 Nov  4 05:00:39 localhost kernel: [   99.757803]  [<b05bc3f0>]
 serial8250_handle_port+0x239/0x25f
 Nov  4 05:00:39 localhost kernel: [   99.757823]  [<b042c465>] ?
 hrtimer_interrupt+0x130/0x140
 Nov  4 05:00:39 localhost kernel: [   99.757840]  [<b05bc463>]
 serial8250_interrupt+0x4d/0xca
 Nov  4 05:00:39 localhost kernel: [   99.757856]  [<b0440ec9>]
 handle_IRQ_event+0x6c/0x12a
 Nov  4 05:00:39 localhost kernel: [   99.757871]  [<b044218b>]
 handle_edge_irq+0xca/0x10d
 Nov  4 05:00:39 localhost kernel: [   99.757884]  [<b04420c1>] ?
 handle_edge_irq+0x0/0x10d
 Nov  4 05:00:39 localhost kernel: [   99.757892]  <IRQ>  [<b040396e>] ?
 do_IRQ+0x34/0x73
 Nov  4 05:00:39 localhost kernel: [   99.757918]  [<b0402ee9>] ?
 common_interrupt+0x29/0x30
 Nov  4 05:00:39 localhost kernel: [   99.757937]  [<b071cb39>] ?
 _spin_unlock_irqrestore+0x12/0x2c
 Nov  4 05:00:39 localhost kernel: [   99.757953]  [<b05bbfd6>] ?
 serial8250_set_termios+0x2a2/0x2c1
 Nov  4 05:00:39 localhost kernel: [   99.757968]  [<b05ba7ba>] ?
 io_serial_out+0x0/0x15
 Nov  4 05:00:39 localhost kernel: [   99.757984]  [<b05ba237>] ?
 uart_resume_port+0x8f/0x197
 Nov  4 05:00:39 localhost kernel: [   99.758002]  [<b05bc8d6>] ?
 serial8250_resume_port+0x5c/0x5f
 Nov  4 05:00:39 localhost kernel: [   99.758017]  [<b05bc8f7>] ?
 serial8250_resume+0x1e/0x22
 Nov  4 05:00:39 localhost kernel: [   99.758036]  [<b05c131d>] ?
 platform_drv_resume+0xc/0xe
 Nov  4 05:00:39 localhost kernel: [   99.758050]  [<b05c13dc>] ?
 platform_pm_resume+0x1f/0x25
 Nov  4 05:00:39 localhost kernel: [   99.758064]  [<b05c2d5a>] ?
 pm_op+0x31/0x5b
 Nov  4 05:00:39 localhost kernel: [   99.758077]  [<b05c32b0>] ?
 device_resume+0x7f/0x296
 Nov  4 05:00:39 localhost kernel: [   99.758094]  [<b0439bd2>] ?
 suspend_devices_and_enter+0x138/0x165
 Nov  4 05:00:39 localhost kernel: [   99.758108]  [<b0439d46>] ?
 enter_state+0x122/0x178
 Nov  4 05:00:39 localhost kernel: [   99.758122]  [<b0439e31>] ?
 state_store+0x95/0xa9
 Nov  4 05:00:39 localhost kernel: [   99.758135]  [<b0439d9c>] ?
 state_store+0x0/0xa9
 Nov  4 05:00:39 localhost kernel: [   99.758152]  [<b053e70d>] ?
 kobj_attr_store+0x16/0x22
 Nov  4 05:00:39 localhost kernel: [   99.758168]  [<b04aac61>] ?
 sysfs_write_file+0xbf/0xea
 Nov  4 05:00:39 localhost kernel: [   99.758188]  [<b0470eeb>] ?
 vfs_write+0x8a/0x103
 Nov  4 05:00:39 localhost kernel: [   99.758201]  [<b04aaba2>] ?
 sysfs_write_file+0x0/0xea
 Nov  4 05:00:39 localhost kernel: [   99.758216]  [<b0470ffb>] ?
 sys_write+0x3b/0x60
 Nov  4 05:00:39 localhost kernel: [   99.758229]  [<b04028f4>] ?
 sysenter_do_call+0x12/0x26
 }}}

 Basically what's happening in the above trace is that as soon as we re-
 enable the serial port, we get an interrupt and as part of the serial
 interrupt path, we call {{{check_modem_status()}}} and we see that
 UART_MSR_DDCD (Delta DCD) bit in the MSR is set and upon checking the DCD
 bit we see it is clear so we call {{{tty_hangup()}}} which ends up sending
 a {{{SIGHUP}}} to the shell (thus the "{{{ttyS0 main process (2388) killed
 by HUP signal }}}" message in #9458) and clears the {{{info.port.tty}}}
 pointer  (as #9420).

 The following simple patch is a temporary workaround that removes the DDCD
 check from the kernel:

 {{{
 diff --git a/drivers/serial/8250.c b/drivers/serial/8250.c
 index a0127e9..3954808 100644
 --- a/drivers/serial/8250.c
 +++ b/drivers/serial/8250.c
 @@ -1498,8 +1498,8 @@ static unsigned int check_modem_status(struct
 uart_8250_port *up)
                         up->port.icount.rng++;
                 if (status & UART_MSR_DDSR)
                         up->port.icount.dsr++;
 -               if (status & UART_MSR_DDCD)
 -                       uart_handle_dcd_change(&up->port, status &
 UART_MSR_DCD);
 +//             if (status & UART_MSR_DDCD)
 +//                     uart_handle_dcd_change(&up->port, status &
 UART_MSR_DCD);
                 if (status & UART_MSR_DCTS)
                         uart_handle_cts_change(&up->port, status &
 UART_MSR_CTS);
 }}}


 As seen below, we the console shell does not die during suspend/resume.
 I've also been able to run a "{{{while true; do echo mem >
 /sys/power/state; done}}}" loop without running into #9420.

 {{{
 [root at localhost dev]# ps
  PID TTY          TIME CMD
 2165 ttyS0    00:00:00 bash
 2452 ttyS0    00:00:00 ps
 [root at localhost dev]# echo mem > /sys/power/state

 +r[root at localhost dev]#
 [root at localhost dev]# ps
  PID TTY          TIME CMD
 2165 ttyS0    00:00:00 bash
 2487 ttyS0    00:00:00 ps
 }}}


 This is not really a solution but just covering up the underlying issue
 and what needs to be done next is further analysis on my end to see if it
 is a completely a software issue and from the folks closer to HW to see if
 there's something happening at the board level that is causing us to lose
 carrier sense (as we don't see this on XO-1 AFAIK, though I need to verify
 that).

-- 
Ticket URL: <http://dev.laptop.org/ticket/9601>
One Laptop Per Child <http://laptop.org/>
OLPC bug tracking system


More information about the Bugs mailing list