Stability and Memory Pressure in 8.2
riccardo
riccardo.lucchese at gmail.com
Tue Sep 9 09:29:55 EDT 2008
On Tue, 2008-09-09 at 00:10 -0400, Michael Stone wrote:
> Dear devel@,
>
> Kim, Greg, and I have concluded that the instability we experience under
> memory-pressure in 8.2-759 and similar is the single "hard" issue that
> we wish to _attempt_ to address before releasing 8.2 on current
> timeframes. (We recognize that there are several other issues marked
> as blocking the release but we are confident that they will be resolved
> satisfactorily or are, in a few cases, beyond help.)
>
> Since most other aspects of the release seem to be running smoothly, Kim
> asked me to take a more direct role in organizing our efforts produce a
> release which avoids memory pressure when possible and which is
> better-behaved when it strikes.
>
> To that end, I would like to ask for your assistance with the following
> questions and tasks:
>
> * We need to determine why we encounter low-memory and out-of-memory
> situations more frequently than in previous releases.
>
> - This means that we need to measure how our memory consumption
> profile has changed since our previous releases.
>
> (cscott observes that we were unable to attack the F-9 image size
> issues until we were able to quantify the effect of changes we had
> made or were considering making. Consequently, he suggests that we
> will be unable to attack our current space consumption problems
> until we are able to generate good numbers (and displays).)
>
> - We need to think carefully about (or measure) whether our
> memory-consumption patterns have changed. I am particularly
> skeptical of our widespread use of tmpfsen since the pages consumed
> by files stored on tmpfsen are permanently dirty (and are perhaps
> accounted for differently than pages mapped into process' address
> spaces?)
>
> - We need to check the configuration of applications like Browse
> which have configurable caching behavior. (Search for "cache" or
> "capacity" in about:config; check for important compile-time
> configuration flags.)
>
> - We need to test in a variety of different network configurations
> in order to determine to what extent the network/presence
> environment affects memory consumption.
>
> * We need to check carefully for memory-leaks. Three mechanisms which
> occur to me include:
>
> 1) running the system for a period of time, then scanning for
> anomalies either manually or in some automated fashion from
> userland, kernel-land, or OFW (via SysRq or SMM).
>
> 2) setting rlimits various processes and noting what dies
>
> 3) using debugging tools like the python garbage collection
> module, guppy/heapy, gdb+macros, valgrind, efence, purify, etc.
> looking for trouble.
>
> * We need to find out why the oom-killer is not killing things fast
> enough. Based on our results, we might consider configuring
> /proc/$pid/oom_adj to preferentially kill some processes (e.g., the
> foreground [or background?] activities.)
>
> * We need to determine whether the oom-killer is killing the right
> processes. (sysctl's vm.oom_dump_tasks can be set to 1 in order to
> get more verbosity from the oom-killer when it fires).
>
> * We ought to ponder whether there are any additional "dirty hacks" we
> can experiment with in order to reduce memory consumption; for
> example, running the Shell and Journal (and DS?) in one process or
> making use of the compressed-caching code published on this list some
> months ago.
>
> * Random other stuff to think about:
>
> - rlimits, cgroups, and the memory resource controller
>
> - the warnings in the ramfs and tmpfs code about the deadlocks that
> tmpfsen can generate under low- or no-memory conditions.
>
> - whether our kernel "overcommits" when allocation requests are made?
>
> - whether we can get Browse to behave intelligently when it receives
> BadAlloc errors from X?
>
> - how to run bootchart on the XO
>
> - how to generate decent statistics and graphics (preferably in an
> automated fashion) concerning memory usage as part of our test
> suite
>
> - system-tap's kmalloc2.stp example
>
> In conclusion, more to come once I have some actual data; _please_ feel
> free to assist in collecting it! (though be aware that I may 'volunteer'
> you if I need your help. (That means you, Tomeu, Riccardo, Deepak,
> ...)).
>
> Regards,
>
> Michael
There are some (trivial) tools (you may be interested in) I've written
and used besides others to attack/study this issues:
* picker [1]
For me it was handier to use then bootchart; will also show per process
mem usage.
* imports timings and alloc statistics [2]
Patch to python that prints timings and mem usage diffs for every
imported module. Original timings patch is from Tomeu.
* python-allocstatsmodule [3]
Inspired by [2] but can be used inside python scripts to collect
stats on heap usage.
! When using `allocstats' to get modules mem usage by wrapping import
statements you will get quite rough/unuseful values because of import
cycles (at least for most interesting modules ;P).
Example app at
http://dev.laptop.org/~rlucchese/utils/python_mods_import_stats.py
Note that [2] and [3] should be better used with a python built with
--without-pymalloc.
We measured that there are quite big memory savings by using the
preload&fork trick (as expected btw). I guess enabling this for `all'
python processes would have a good (mem saving)/(work hours) ratio.
thanks,
riccardo
[1] git://dev.laptop.org/activities/picker
[2]
http://dev.laptop.org/~rlucchese/patches/python_show_mem_stats_on_module_loading.patch
[3] git://dev.laptop.org/users/rlucchese/python-allocstatsmodule/.git
More information about the Devel
mailing list