OOM manager project

Marc E. Fiuczynski mef at CS.Princeton.EDU
Mon Jul 23 09:44:28 EDT 2007


Hello Jim,

When I last sent this the time/date of the message was about half a day 
before your initial message.  So I am not sure if you got my response.  
Please see below my response to your OOM message.

Best regards,
Marc


Marc E. Fiuczynski wrote:
> Hello Jim,
>
> We (PlanetLab) have found that OOM does some relative bad things causing 
> a system to get into an unusable state.  We replaced OOM with something 
> that just panics and reboots rather than letting the system get into an 
> unrecoverable state, which we need as many of our PL servers are in 
> remote locations and unattended (kinda like your mini-servers will be, 
> but unlike your laptops).  And we then introduced a user-level OOM 
> governor, which is probably something far more rudimentary than what you 
> are after.  Our governor, called pl_mom because "she cleans up your 
> mess", assumes that separate applications/services are instantiated in 
> separate vservers (slices).  From what I gather, this is definitely the 
> direction that OLPC is going for the laptop and mini-server-gateways, so 
> our approach might be at least from a thought perspective applicable.
>
> What does pl_mom do?  At the moment she kills the vserver with the 
> largest aggregate VSZ (i.e., all processes within that vserver).  This 
> works for PlanetLab, but might not the best approach for OLPC.  We have 
> found that most OOM scenarios occur by a slow leaker that has its pages 
> swapped out by kswap (which happens on the order of a few hours and are 
> hard to detect with the current vm metrics we peek at).  Since pl_mom 
> does the trick for our usage scenario on PlanetLab for now we have not 
> had an incentive to improve it further.  However, one should definitely 
> look at better vm statistics to make a better choice than largest 
> aggregate VSZ.
>
> The code for pl_mom is available via anon cvs from:
>
> cvs -d :pserver:anonymous at cvs.planet-lab.org:/cvs co pl_mom
>
> Take a peek at swap_mom.py and its helper functions in pl_mom.py.
>
> I'm cc'ing Faiyaz Ahmed, who is the person at Princeton who is currently 
> maintaining pl_mom.
>
> Best regards,
> Marc
>
> Jim Gettys wrote:
>   
>> OLPC needs a OOM governor, so that the "right" process gets shot when we
>> run low on RAM, and that processes that might get shot know enough to
>> save state for restart.  As you know, various problems appear if the
>> wrong process is killed, usually resulting in needing a restart.
>>
>> Note that the kernel has to be able to recover memory when it needs it,
>> or it will deadlock: this is a situation where the kernel must be in
>> control, but user space could cooperate much better than it does today,
>> by providing appropriate hints.  So don't say: "the kernel shouldn't
>> kill processes: user space should"; that design doesn't fly.
>>
>> Here's Kimmo Hämäläinen description of the (current) kernel OOM killer.
>>
>> The OOM killer selects a process to kill by assigning a score to each
>> process; the process with the highest score is the lucky winner that
>> will be killed. The current OOM score for
>> a process is visible in proc. The entry is in /proc/PID/oom score. The
>> starting point of the score is the amount of memory consumed by the
>> process and its children. This value is adjusted as follows:
>> • It is set to zero if the process has no memory management or if the
>> process has a negative
>> nice value (this can be used for protecting processes from the killer).
>> • Divided by the square root of the CPU time consumed by the process.
>> • Divided by the square root of the square root of the run time of the
>> process.
>> • Multiplied by 2 if it is a process with a positive nice value.
>> • Divided by 4 if it is a superuser process.
>> • Divided by 4 if it is a process with direct hardware access.
>> • Finally, the value is adjusted (shifted either left or right) by the
>> oom adj value. It is shifted left in case the value is positive and
>> right in case the value is negative.
>> This means that a negative oom adj value will decrease the score and
>> also decrease the risk that the particular process will be killed. A
>> positive value will have the opposite effect. The value should be no
>> smaller than -16 and no larger than 15.
>>
>> Please note that you can set the oom adj value in the proc file system.
>> It is located at /proc/PID/oom_adj. For more information about how the
>> OOM killer behaves, see the Linux kernel source code, mm/oom kill.c in
>> particular.
>>
>> So we need an OOM killer helper.  
>>
>> We have the ability to provide the kernel with much of the 
>> information it needs for much better behavior, if we choose.
>>
>> I see this project evolving through the following incremental
>> improvements (and incremental difficulty) as set out below:
>>
>> 1) start by setting the oom_adj appropriately so that the processes we
>> really care about don't get shot.
>>
>> 2) make this a window manager plug in (plug in, as people including us
>> may end up using other window managers) that uses the stacking order on
>> the screen to rank order the activities that are running.
>>
>> 3) provide a mechanism by which applications may be given a hint that
>> they might find it good to save enough state for a checkpoint restart,
>> because they are likely a good candidate for shooting.
>>
>> 4) use the XRes facilities in X (and/or modify X) to provide the kernel
>> with the pixmap usage on a process ID basis, for local
>> applications/activities.
>>
>> 5) see if there are better OOM algorithms that Linux presently has.
>>
>> Discussion?  Anyone want to take on this project, or parts of this
>> project?
>>                                         - Jim
>>
>>     
> _______________________________________________
> Devel mailing list
> Devel at lists.laptop.org
> http://lists.laptop.org/listinfo/devel
>   




More information about the Devel mailing list