sugar start-up profiling

riccardo riccardo.lucchese at gmail.com
Tue Jul 29 10:15:07 EDT 2008


Problem: sugar starts up slowly on the xo

Tests were ran on an xo after a clean joyride-2181 install plus
sucrose's activities.

A graph of the ending part of the boot process can be found at:
http://dev.laptop.org/~rlucchese/boot/boot.stats.svg
(http://dev.laptop.org/~rlucchese/boot/boot.stats)

It was obtained by running:
$ picker -t 75 -o /tmp/boot.stats &
$ grapher -i boot.stats -c10

The sampling period begins in `start()' in haldaemon's init script and
lasts 75 seconds.

The following tab shows cpu usage of the 10 processes that took more cpu
time during the sampling time:
$ grapher -i boot.stats -c13 -r cpu

 tot%	ps%	cmdline
 -----------------------
   	27.7	/bin/sh /usr/bin/sugar
 44.6	16.8	jffs2_gcd_mtd0
 51.7	7.1	/usr/bin/env python /usr/bin/sugar-activity journal...
 56.2	4.5	python /usr/bin/datastore-service
 60.6	4.4	picker -t75 -o /tmp/boot.stats
 64.3	3.8	python /usr/sbin/rainbow-daemon --daemon
 68.0	3.6	xinit /usr/bin/olpc-session -- /usr/bin/X -fp built-ins...
 71.3	3.3	/bin/sh /usr/bin/olpc-session
 74.1	2.8	python /usr/bin/sugar-shell-service
 76.9	2.8	python /usr/bin/sugar-presence-service
 79.7	2.8	python /usr/sbin/rainbow-daemon --daemon
 81.2	1.5	hald
 82.6	1.4	ohmd

Renicing jffs2_gcd_mtd0 to 19 in haldaemon's init script slightly
speeds-up the last
part of the boot:
$ cat /home/olpc/.boot_time.prev  (clean joyride)
56.27

$ cat /home/olpc/.boot_time
54.14

Note however that when renicing jffs2_gcd_mtd0, boot timings `became
less deterministic, sometimes being slower than the `clean install'
case. Stopwatch timings (from boot loader to fully redrawn shell):
clean install : 1m34.0s
renice trick  : 1m30.2s +[0. to 4.5 sec]

Would it be possible to make DS, sugar-presence-service and
sugar-shell-service system services and thus decoupling them from the
shell's start-up?



The next tab shows the 10 processes that hit more major faults during
the sampling time:

$ grapher -i boot.stats -m6 -r faults

 minflt	majflt	cmdline
 -----------------------
 15363	154	/bin/sh /usr/bin/sugar
 3122	57	python /usr/sbin/rainbow-daemon --daemon
 1829	51	xinit /usr/bin/olpc-session -- /usr/bin/X -fp built-ins -wr
-aut
 3576	38	python /usr/bin/datastore-service
 498	37	ohmd
 1270	16	hald

A strace of `dbus-launch ... sugar-shell' in /usr/bin/sugar can be found
at:
http://dev.laptop.org/~rlucchese/boot/dbus-launch_sugar-shell.strace

A script was run that cat'ed all files (5700+) that sugar tries to open
on boot. (http://dev.laptop.org/~rlucchese/boot/cat_sugar_startup_files)

$ time sh cat_sugar_startup_files
real 0m16.9s
user 0m1.0s
sys  0m12.2s

Timings vary a lot when repeating the test (+-~30%).
`top' shows that the remaining time goes to io-wait.

Why don't those files get cached when repeating the test ?



cProfile statistics (KCG format) for sugar-shell (start-up only):
http://dev.laptop.org/~rlucchese/boot/cProfile-shell

Functions ordered by self-time (%):
 32.2   gtk.main()
 19.6   send_message_with_reply_and_block of dbus
 12.8   block of dbus.lowlevel.PendingCall
  7.3   grab of sugar._sugarext.KeyGrabber
  3.    _add_weight of sugar/shell/view/home/grid.py
  2.7  render_cairo of rsvg.Handle
  1.12 __getitem__ of sugar/shell/view/home/grid.py
---------------------------------------------------
 78.7 %
 
Perhaps some dbus calls can be made asynchronous; this is taking 30+% of
start-up time.

It's curious that `grab` takes so much time: I don't see anything
obvious looking at the code.

Note that the layout for the favorite's view was set to the ring type;
why is the shell computing weights in this case?


High level tasks/functions ordered by total-time:
 67.3   gtk.main()
 33.6   __init__ of sugar/view/Shell.py
 16.8   _start_journal_idle of sugar/view/Shell.py
 11.6   __init__ shell/view/keyhandler.py
  7.5   add sugar/shell/view/home/spreadlayout.py

Note that 45% of Shell.__init__ and the 89% of _start_journal_idle
went to blocking dbus calls.



The next tab shows start-up timings extracted from the statistics
gathered with picker (the sampler may have missed processes lasting
shorter than the sampling period):

http://dev.laptop.org/~rlucchese/boot/sugar_xo_boot_timings

$ grapher -i boot.stats  -r timings,cpu | `a bit of selection

 tot%	ps%	start	length	cmdline
 --------------------------------------
 16.9	16.8	0.00	74.99	jffs2_gcd_mtd0
  ..
 36.5	3.8	12.44	62.55	python /usr/sbin/rainbow-daemon --daemon
  ..
 37.8	0.1	18.88	56.11	/usr/sbin/olpc-dm
  ..
 38.0	0.0	19.32	55.67	/bin/sh /usr/bin/startx /usr/bin/...
  ..
 41.7	3.6	19.87	55.12	xinit /usr/bin/olpc-session -- ...
  ..
 46.0	3.3	24.27	50.72	/bin/sh /usr/bin/olpc-session
  ..
 74.0	27.7	28.24	46.75	/bin/sh /usr/bin/sugar
  ..
 75.1	0.4	49.15	25.84	matchbox-window-manager -use_titlebar...
  ..
 77.9	2.8	49.25	25.74	python /usr/bin/sugar-presence-service
  ..
 81.2	2.8	53.86	21.13	python /usr/bin/sugar-shell-service
  ..
 85.7	4.5	58.89	16.10	python /usr/bin/datastore-service
  ..
 92.8	7.1	62.86	12.13	usr/bin/env python /usr/... (journal)



Accurate shell's start-up timings were taken by instrumentation:
---first launch (normal boot)---
/usr/bin/sugar                              : (1217317352.992) --  0.
entering main() in shell/main.py            : (1217317368.162) -- 15.2
before entering gtk's mainloop shell/main.py: (1217317375.464) -- 22.5

---second launch (ctrl+alt+erase)---
/usr/bin/sugar                              : (1217317725.747) --  0.
entering main() in shell/main.py            : (1217317734.939) --  9.2
before entering gtk's mainloop shell/main.py: (1217317741.023) -- 15.3



The datastore service and the journal will be profiled in separate
sessions.

Thanks,

riccardo




More information about the Devel mailing list