profiling the resume path

Marcelo Tosatti marcelo at kvack.org
Thu Sep 6 03:16:27 EDT 2007


Hi Thomas,

I found something that looks strange behaviour while profiling the
resume-from-RAM path on OLPC.

A little explanation on traces: they have been gathered using Arnaldo's
refactoring of latencytrace patches 

>From 3e81bb2749ab132f7ff68a22880754e206d7986a Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo <acme at ghostprotocols.net>
Date: Wed, 25 Jul 2007 15:12:22 -0300
Subject: [PATCH 1/9] [MCOUNT]: Add basic support for gcc profiler
instrumentation

If CONFIG_MCOUNT is selected and /proc/sys/kernel/mcount_enabled is set to a
non-zero value the mcount routine will be called everytime we enter a kernel
function that is not marked with the "notrace" attribute.

Then it records that into a ring buffer.

so, 

This "no need resched!" printk was added inside the 

                while (!need_resched()) {
                        void (*idle)(void);
			...
		}

loop of cpu_idle().

And I gathered two different traces of the exact same kernel.

Trace #1:

[    0.059910] ohci_hcd 0000:00:0f.4: powerup ports
[    0.060119] no need resched!
[    0.070230] no need resched!
[    0.097732] no need resched!
[    0.106032] tick_nohz_stop_sched_tick: next_jiffies = 4294947976, last_jiffies = 4294947974 jiffies = 4294947974
[    0.106174] tick_nohz_stop_sched_tick: expires=[106,800000000]
[    0.106270] tick_nohz_stop_sched_tick: now=[106,786802612]
[    0.106369] no need resched!
[    0.119478] tick_nohz_stop_sched_tick: next_jiffies = 4294947978, last_jiffies = 4294947976 jiffies = 4294947976
[    0.119618] tick_nohz_stop_sched_tick: expires=[106,820000000]
[    0.119714] tick_nohz_stop_sched_tick: now=[106,800253059]
[    0.119813] no need resched!

Then the resume path restarts at (this is from mctrace, I added another
msleep right after msleep(20) for diagnosis).

[    0.119336] ohci_pci_resume+0x8c/0xa0 -> msleep+0xb/0x27

Trace #2:

Another run with the same kernel.

[    0.059903] ohci_hcd 0000:00:0f.4: powerup ports
[    0.060108] no need resched!
[    0.070225] no need resched!
[    0.097726] no need resched!
[    0.125226] no need resched!
[    0.152728] no need resched!
[    0.180230] no need resched!
[    0.206887] tick_nohz_stop_sched_tick: next_jiffies = 4294942086,
last_jiffies = 4294942084 jiffies = 4294942084
[    0.207028] tick_nohz_stop_sched_tick: expires=[47,900000000]
[    0.207123] tick_nohz_stop_sched_tick: now=[47,886800917]

Which restarts at

[    0.220191] ohci_pci_resume+0x8c/0xa0 -> msleep+0xb/0x27

Sorry, but how the hell that can happen? 

Both runs look totally hosed since the msleep(20) call from
ohci_pci_resume happens at 59ms, and we have an interrupt at 97ms (on
both traces) which should get the system out of the idle loop.

Full traces:
http://dev.laptop.org/~marcelo/mctrace/trace1.txt
http://dev.laptop.org/~marcelo/mctrace/trace2.txt

Whoever sets TIF_NEED_RESCHED is not doing its job properly?



More information about the Devel mailing list