#9601 NORM Not Tri: Loss of serial Data Carrier Detect signal on suspend/resume
Zarro Boogs per Child
bugtracker at laptop.org
Wed Nov 4 01:29:59 EST 2009
#9601: Loss of serial Data Carrier Detect signal on suspend/resume
--------------------------+-------------------------------------------------
Reporter: dsaxena | Owner: dsaxena
Type: defect | Status: new
Priority: normal | Milestone: Not Triaged
Component: not assigned | Version: not specified
Keywords: | Next_action: diagnose
Verified: 0 | Deployment_affected:
Blockedby: | Blocking: 9420, 9458
--------------------------+-------------------------------------------------
I am working on debugging #9420 and #9458 and have determined that they
are both symptoms of the same underlying issue: When we return from
resume, we are loosing the serial carrier signal (DCD).
{{{
Nov 4 05:00:39 localhost kernel: [ 99.754261] dcon_source_switch to CPU
Nov 4 05:00:39 localhost kernel: [ 99.757652] Pid: 2183, comm: bash Not
tainted 2.6.30.1 #26
Nov 4 05:00:39 localhost kernel: [ 99.757660] Call Trace:
Nov 4 05:00:39 localhost kernel: [ 99.757684] [<b071adf2>] ?
printk+0xf/0x15
Nov 4 05:00:39 localhost kernel: [ 99.757703] [<b05a7d85>]
tty_hangup+0x21/0x33
Nov 4 05:00:39 localhost kernel: [ 99.757723] [<b040af55>] ?
lapic_next_event+0x16/0x1a
Nov 4 05:00:39 localhost kernel: [ 99.757742] [<b0431bdb>] ?
clockevents_program_event+0xba/0xc8
Nov 4 05:00:39 localhost kernel: [ 99.757765] [<b04328f0>] ?
tick_dev_program_event+0x34/0xa2
Nov 4 05:00:39 localhost kernel: [ 99.757787] [<b05baf21>]
check_modem_status+0x99/0x11e
Nov 4 05:00:39 localhost kernel: [ 99.757803] [<b05bc3f0>]
serial8250_handle_port+0x239/0x25f
Nov 4 05:00:39 localhost kernel: [ 99.757823] [<b042c465>] ?
hrtimer_interrupt+0x130/0x140
Nov 4 05:00:39 localhost kernel: [ 99.757840] [<b05bc463>]
serial8250_interrupt+0x4d/0xca
Nov 4 05:00:39 localhost kernel: [ 99.757856] [<b0440ec9>]
handle_IRQ_event+0x6c/0x12a
Nov 4 05:00:39 localhost kernel: [ 99.757871] [<b044218b>]
handle_edge_irq+0xca/0x10d
Nov 4 05:00:39 localhost kernel: [ 99.757884] [<b04420c1>] ?
handle_edge_irq+0x0/0x10d
Nov 4 05:00:39 localhost kernel: [ 99.757892] <IRQ> [<b040396e>] ?
do_IRQ+0x34/0x73
Nov 4 05:00:39 localhost kernel: [ 99.757918] [<b0402ee9>] ?
common_interrupt+0x29/0x30
Nov 4 05:00:39 localhost kernel: [ 99.757937] [<b071cb39>] ?
_spin_unlock_irqrestore+0x12/0x2c
Nov 4 05:00:39 localhost kernel: [ 99.757953] [<b05bbfd6>] ?
serial8250_set_termios+0x2a2/0x2c1
Nov 4 05:00:39 localhost kernel: [ 99.757968] [<b05ba7ba>] ?
io_serial_out+0x0/0x15
Nov 4 05:00:39 localhost kernel: [ 99.757984] [<b05ba237>] ?
uart_resume_port+0x8f/0x197
Nov 4 05:00:39 localhost kernel: [ 99.758002] [<b05bc8d6>] ?
serial8250_resume_port+0x5c/0x5f
Nov 4 05:00:39 localhost kernel: [ 99.758017] [<b05bc8f7>] ?
serial8250_resume+0x1e/0x22
Nov 4 05:00:39 localhost kernel: [ 99.758036] [<b05c131d>] ?
platform_drv_resume+0xc/0xe
Nov 4 05:00:39 localhost kernel: [ 99.758050] [<b05c13dc>] ?
platform_pm_resume+0x1f/0x25
Nov 4 05:00:39 localhost kernel: [ 99.758064] [<b05c2d5a>] ?
pm_op+0x31/0x5b
Nov 4 05:00:39 localhost kernel: [ 99.758077] [<b05c32b0>] ?
device_resume+0x7f/0x296
Nov 4 05:00:39 localhost kernel: [ 99.758094] [<b0439bd2>] ?
suspend_devices_and_enter+0x138/0x165
Nov 4 05:00:39 localhost kernel: [ 99.758108] [<b0439d46>] ?
enter_state+0x122/0x178
Nov 4 05:00:39 localhost kernel: [ 99.758122] [<b0439e31>] ?
state_store+0x95/0xa9
Nov 4 05:00:39 localhost kernel: [ 99.758135] [<b0439d9c>] ?
state_store+0x0/0xa9
Nov 4 05:00:39 localhost kernel: [ 99.758152] [<b053e70d>] ?
kobj_attr_store+0x16/0x22
Nov 4 05:00:39 localhost kernel: [ 99.758168] [<b04aac61>] ?
sysfs_write_file+0xbf/0xea
Nov 4 05:00:39 localhost kernel: [ 99.758188] [<b0470eeb>] ?
vfs_write+0x8a/0x103
Nov 4 05:00:39 localhost kernel: [ 99.758201] [<b04aaba2>] ?
sysfs_write_file+0x0/0xea
Nov 4 05:00:39 localhost kernel: [ 99.758216] [<b0470ffb>] ?
sys_write+0x3b/0x60
Nov 4 05:00:39 localhost kernel: [ 99.758229] [<b04028f4>] ?
sysenter_do_call+0x12/0x26
}}}
Basically what's happening in the above trace is that as soon as we re-
enable the serial port, we get an interrupt and as part of the serial
interrupt path, we call {{{check_modem_status()}}} and we see that
UART_MSR_DDCD (Delta DCD) bit in the MSR is set and upon checking the DCD
bit we see it is clear so we call {{{tty_hangup()}}} which ends up sending
a {{{SIGHUP}}} to the shell (thus the "{{{ttyS0 main process (2388) killed
by HUP signal }}}" message in #9458) and clears the {{{info.port.tty}}}
pointer (as #9420).
The following simple patch is a temporary workaround that removes the DDCD
check from the kernel:
{{{
diff --git a/drivers/serial/8250.c b/drivers/serial/8250.c
index a0127e9..3954808 100644
--- a/drivers/serial/8250.c
+++ b/drivers/serial/8250.c
@@ -1498,8 +1498,8 @@ static unsigned int check_modem_status(struct
uart_8250_port *up)
up->port.icount.rng++;
if (status & UART_MSR_DDSR)
up->port.icount.dsr++;
- if (status & UART_MSR_DDCD)
- uart_handle_dcd_change(&up->port, status &
UART_MSR_DCD);
+// if (status & UART_MSR_DDCD)
+// uart_handle_dcd_change(&up->port, status &
UART_MSR_DCD);
if (status & UART_MSR_DCTS)
uart_handle_cts_change(&up->port, status &
UART_MSR_CTS);
}}}
As seen below, we the console shell does not die during suspend/resume.
I've also been able to run a "{{{while true; do echo mem >
/sys/power/state; done}}}" loop without running into #9420.
{{{
[root at localhost dev]# ps
PID TTY TIME CMD
2165 ttyS0 00:00:00 bash
2452 ttyS0 00:00:00 ps
[root at localhost dev]# echo mem > /sys/power/state
+r[root at localhost dev]#
[root at localhost dev]# ps
PID TTY TIME CMD
2165 ttyS0 00:00:00 bash
2487 ttyS0 00:00:00 ps
}}}
This is not really a solution but just covering up the underlying issue
and what needs to be done next is further analysis on my end to see if it
is a completely a software issue and from the folks closer to HW to see if
there's something happening at the board level that is causing us to lose
carrier sense (as we don't see this on XO-1 AFAIK, though I need to verify
that).
--
Ticket URL: <http://dev.laptop.org/ticket/9601>
One Laptop Per Child <http://laptop.org/>
OLPC bug tracking system
More information about the Bugs
mailing list