Stability and Memory Pressure in 8.2
Michael Stone
michael at laptop.org
Tue Sep 9 00:10:53 EDT 2008
Dear devel@,
Kim, Greg, and I have concluded that the instability we experience under
memory-pressure in 8.2-759 and similar is the single "hard" issue that
we wish to _attempt_ to address before releasing 8.2 on current
timeframes. (We recognize that there are several other issues marked
as blocking the release but we are confident that they will be resolved
satisfactorily or are, in a few cases, beyond help.)
Since most other aspects of the release seem to be running smoothly, Kim
asked me to take a more direct role in organizing our efforts produce a
release which avoids memory pressure when possible and which is
better-behaved when it strikes.
To that end, I would like to ask for your assistance with the following
questions and tasks:
* We need to determine why we encounter low-memory and out-of-memory
situations more frequently than in previous releases.
- This means that we need to measure how our memory consumption
profile has changed since our previous releases.
(cscott observes that we were unable to attack the F-9 image size
issues until we were able to quantify the effect of changes we had
made or were considering making. Consequently, he suggests that we
will be unable to attack our current space consumption problems
until we are able to generate good numbers (and displays).)
- We need to think carefully about (or measure) whether our
memory-consumption patterns have changed. I am particularly
skeptical of our widespread use of tmpfsen since the pages consumed
by files stored on tmpfsen are permanently dirty (and are perhaps
accounted for differently than pages mapped into process' address
spaces?)
- We need to check the configuration of applications like Browse
which have configurable caching behavior. (Search for "cache" or
"capacity" in about:config; check for important compile-time
configuration flags.)
- We need to test in a variety of different network configurations
in order to determine to what extent the network/presence
environment affects memory consumption.
* We need to check carefully for memory-leaks. Three mechanisms which
occur to me include:
1) running the system for a period of time, then scanning for
anomalies either manually or in some automated fashion from
userland, kernel-land, or OFW (via SysRq or SMM).
2) setting rlimits various processes and noting what dies
3) using debugging tools like the python garbage collection
module, guppy/heapy, gdb+macros, valgrind, efence, purify, etc.
looking for trouble.
* We need to find out why the oom-killer is not killing things fast
enough. Based on our results, we might consider configuring
/proc/$pid/oom_adj to preferentially kill some processes (e.g., the
foreground [or background?] activities.)
* We need to determine whether the oom-killer is killing the right
processes. (sysctl's vm.oom_dump_tasks can be set to 1 in order to
get more verbosity from the oom-killer when it fires).
* We ought to ponder whether there are any additional "dirty hacks" we
can experiment with in order to reduce memory consumption; for
example, running the Shell and Journal (and DS?) in one process or
making use of the compressed-caching code published on this list some
months ago.
* Random other stuff to think about:
- rlimits, cgroups, and the memory resource controller
- the warnings in the ramfs and tmpfs code about the deadlocks that
tmpfsen can generate under low- or no-memory conditions.
- whether our kernel "overcommits" when allocation requests are made?
- whether we can get Browse to behave intelligently when it receives
BadAlloc errors from X?
- how to run bootchart on the XO
- how to generate decent statistics and graphics (preferably in an
automated fashion) concerning memory usage as part of our test
suite
- system-tap's kmalloc2.stp example
In conclusion, more to come once I have some actual data; _please_ feel
free to assist in collecting it! (though be aware that I may 'volunteer'
you if I need your help. (That means you, Tomeu, Riccardo, Deepak,
...)).
Regards,
Michael
P.S. - Thanks to cscott and cjb for their advice during our brief
planning session.
P.P.S. - Please follow up if you think I missed any avenues that might
be worth pursuing in order to address this rather large and fuzzy problem
space -- there's plenty of room left for good ideas that didn't occur to
me, Scott, or Chris.
More information about the Devel
mailing list