OOM manager project
Marc E. Fiuczynski
mef at CS.Princeton.EDU
Mon Jul 23 09:44:28 EDT 2007
Hello Jim,
When I last sent this the time/date of the message was about half a day
before your initial message. So I am not sure if you got my response.
Please see below my response to your OOM message.
Best regards,
Marc
Marc E. Fiuczynski wrote:
> Hello Jim,
>
> We (PlanetLab) have found that OOM does some relative bad things causing
> a system to get into an unusable state. We replaced OOM with something
> that just panics and reboots rather than letting the system get into an
> unrecoverable state, which we need as many of our PL servers are in
> remote locations and unattended (kinda like your mini-servers will be,
> but unlike your laptops). And we then introduced a user-level OOM
> governor, which is probably something far more rudimentary than what you
> are after. Our governor, called pl_mom because "she cleans up your
> mess", assumes that separate applications/services are instantiated in
> separate vservers (slices). From what I gather, this is definitely the
> direction that OLPC is going for the laptop and mini-server-gateways, so
> our approach might be at least from a thought perspective applicable.
>
> What does pl_mom do? At the moment she kills the vserver with the
> largest aggregate VSZ (i.e., all processes within that vserver). This
> works for PlanetLab, but might not the best approach for OLPC. We have
> found that most OOM scenarios occur by a slow leaker that has its pages
> swapped out by kswap (which happens on the order of a few hours and are
> hard to detect with the current vm metrics we peek at). Since pl_mom
> does the trick for our usage scenario on PlanetLab for now we have not
> had an incentive to improve it further. However, one should definitely
> look at better vm statistics to make a better choice than largest
> aggregate VSZ.
>
> The code for pl_mom is available via anon cvs from:
>
> cvs -d :pserver:anonymous at cvs.planet-lab.org:/cvs co pl_mom
>
> Take a peek at swap_mom.py and its helper functions in pl_mom.py.
>
> I'm cc'ing Faiyaz Ahmed, who is the person at Princeton who is currently
> maintaining pl_mom.
>
> Best regards,
> Marc
>
> Jim Gettys wrote:
>
>> OLPC needs a OOM governor, so that the "right" process gets shot when we
>> run low on RAM, and that processes that might get shot know enough to
>> save state for restart. As you know, various problems appear if the
>> wrong process is killed, usually resulting in needing a restart.
>>
>> Note that the kernel has to be able to recover memory when it needs it,
>> or it will deadlock: this is a situation where the kernel must be in
>> control, but user space could cooperate much better than it does today,
>> by providing appropriate hints. So don't say: "the kernel shouldn't
>> kill processes: user space should"; that design doesn't fly.
>>
>> Here's Kimmo Hämäläinen description of the (current) kernel OOM killer.
>>
>> The OOM killer selects a process to kill by assigning a score to each
>> process; the process with the highest score is the lucky winner that
>> will be killed. The current OOM score for
>> a process is visible in proc. The entry is in /proc/PID/oom score. The
>> starting point of the score is the amount of memory consumed by the
>> process and its children. This value is adjusted as follows:
>> • It is set to zero if the process has no memory management or if the
>> process has a negative
>> nice value (this can be used for protecting processes from the killer).
>> • Divided by the square root of the CPU time consumed by the process.
>> • Divided by the square root of the square root of the run time of the
>> process.
>> • Multiplied by 2 if it is a process with a positive nice value.
>> • Divided by 4 if it is a superuser process.
>> • Divided by 4 if it is a process with direct hardware access.
>> • Finally, the value is adjusted (shifted either left or right) by the
>> oom adj value. It is shifted left in case the value is positive and
>> right in case the value is negative.
>> This means that a negative oom adj value will decrease the score and
>> also decrease the risk that the particular process will be killed. A
>> positive value will have the opposite effect. The value should be no
>> smaller than -16 and no larger than 15.
>>
>> Please note that you can set the oom adj value in the proc file system.
>> It is located at /proc/PID/oom_adj. For more information about how the
>> OOM killer behaves, see the Linux kernel source code, mm/oom kill.c in
>> particular.
>>
>> So we need an OOM killer helper.
>>
>> We have the ability to provide the kernel with much of the
>> information it needs for much better behavior, if we choose.
>>
>> I see this project evolving through the following incremental
>> improvements (and incremental difficulty) as set out below:
>>
>> 1) start by setting the oom_adj appropriately so that the processes we
>> really care about don't get shot.
>>
>> 2) make this a window manager plug in (plug in, as people including us
>> may end up using other window managers) that uses the stacking order on
>> the screen to rank order the activities that are running.
>>
>> 3) provide a mechanism by which applications may be given a hint that
>> they might find it good to save enough state for a checkpoint restart,
>> because they are likely a good candidate for shooting.
>>
>> 4) use the XRes facilities in X (and/or modify X) to provide the kernel
>> with the pixmap usage on a process ID basis, for local
>> applications/activities.
>>
>> 5) see if there are better OOM algorithms that Linux presently has.
>>
>> Discussion? Anyone want to take on this project, or parts of this
>> project?
>> - Jim
>>
>>
> _______________________________________________
> Devel mailing list
> Devel at lists.laptop.org
> http://lists.laptop.org/listinfo/devel
>
More information about the Devel
mailing list