Stability and Memory Pressure in 8.2

James Cameron quozl at laptop.org
Sun Sep 14 00:42:36 EDT 2008


On Fri, Sep 12, 2008 at 02:22:54PM +1200, Martin Langhoff wrote:
> So a moderate grow in memory footprint is not likely to be the
> problem. My suspicion is that some *very common* operation in the UI
> or our APIs has the potential to eat up a big chunk of your memory
> very quickly.

I agree, because I also haven't seen a gradual decline, rather a sudden
loss instead.

But then, that's the nature of running out of memory ... when the limit
is hit, that's it, there's no more memory to be freed, nothing can be
released, and so the failure is sudden.

I recall someone noticed that the animated activity icon was redrawing
the whole screen.  I think it got fixed.  Since it got fixed, I haven't
seen as many OOMs during olpc-update.

The kernel we're using does not allocate the physical memory unless
there is demand.  That is to say, if a program allocates memory with
brk, sbrk, or the higher level things like malloc, there is no
significant change to the amount of free memory until and unless some
process tries to use what it has allocated, by writing to or reading
from it.

I've run a C program that tries to force the out of memory condition.

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>

main() {
  int i, j;
  char *x;
  for (j=0, i=8; i<32; i++) {
    fprintf(stderr, "nom %010d, pounce on %d, ", j, 1<<i);
    x = calloc(1<<i, 1);
    if (x == NULL) {
      fprintf(stderr, "ouch, backing off.\n"); i--; i--; i--; continue;
    }
    j += 1<<i;
    fprintf(stderr, "snared, ");
    memset(x, i+1, 1<<i);
    fprintf(stderr, "clawed, ");
    usleep(500000);
    fprintf(stderr, "meow!\n");
  }
}

Note the call to memset(3).  Without this, there is no corresponding
decrease in free memory each time around the loop.

Note the behaviour in response to calloc(3) failure ... the code retries
with a quarter of the request size, and continues.

This program will generally cause OOM on all the builds I've tried it
on, so I consider it a useful reproducer.

A test I've just done is to run the current version of the Measure
activity, on joyride 2436 and 8.2-760, then run oom in a shell over SSH.

Here's the joyride 2436 output:

-bash-3.2# oom
nom 0000000000, pounce on 256, snared, clawed, meow!
nom 0000000256, pounce on 512, snared, clawed, meow!
nom 0000000768, pounce on 1024, snared, clawed, meow!
nom 0000001792, pounce on 2048, snared, clawed, meow!
nom 0000003840, pounce on 4096, snared, clawed, meow!
nom 0000007936, pounce on 8192, snared, clawed, meow!
nom 0000016128, pounce on 16384, snared, clawed, meow!
nom 0000032512, pounce on 32768, snared, clawed, meow!
nom 0000065280, pounce on 65536, snared, clawed, meow!
nom 0000130816, pounce on 131072, snared, clawed, meow!
nom 0000261888, pounce on 262144, snared, clawed, meow!
nom 0000524032, pounce on 524288, snared, clawed, meow!
nom 0001048320, pounce on 1048576, snared, clawed, meow!
nom 0002096896, pounce on 2097152, snared, clawed, meow!
nom 0004194048, pounce on 4194304, snared, clawed, meow!
nom 0008388352, pounce on 8388608, snared, clawed, meow!
nom 0016776960, pounce on 16777216, snared, clawed, meow!
nom 0033554176, pounce on 33554432, snared, clawed, meow!
nom 0067108608, pounce on 67108864, ouch, backing off.
nom 0067108608, pounce on 16777216, snared, clawed, meow!
nom 0083885824, pounce on 33554432, ouch, backing off.
nom 0083885824, pounce on 8388608, snared, clawed, meow!
nom 0092274432, pounce on 16777216, snared, clawed, meow!
nom 0109051648, pounce on 33554432, ouch, backing off.
nom 0109051648, pounce on 8388608, snared, clawed, meow!
nom 0117440256, pounce on 16777216, snared, clawed, meow!
nom 0134217472, pounce on 33554432, ouch, backing off.
nom 0134217472, pounce on 8388608, snared, clawed, meow!
nom 0142606080, pounce on 16777216, snared, clawed, meow!
nom 0159383296, pounce on 33554432, ouch, backing off.
nom 0159383296, pounce on 8388608, snared, clawed, meow!
nom 0167771904, pounce on 16777216, ouch, backing off.
nom 0167771904, pounce on 4194304, snared, clawed, meow!
nom 0171966208, pounce on 8388608, snared, Killed

By comparison, the 8.2-760 output ended with:

nom 0171966208, pounce on 2097152, snared, clawed, meow!
nom 0174063360, pounce on 4194304, snared, clawed, meow!
nom 0178257664, pounce on 8388608, snared, Killed

I don't think this 8Mb difference is significant.

The response on screen was interesting:

1.  at about the 67Mb mark, Measure began to slow,

2.  at about the 92Mb mark, Measure stopped drawing,

3.  for quite a few seconds, nothing was happening on screen, the
touchpad would not move the pointer, and the oom program was still
working,

4.  if the oom program was running as user olpc, it was killed, and
Measure resumed drawing, everything went back to normal,

5.  if the oom program was running as user root, several other things
were killed first, including the Sugar shell, Measure, X, and Journal,
and eventually the oom program was killed, which then allowed X and
Sugar to restart.

Obviously, don't run out of memory as root.

dmesg on each system showed that the oom-killer was being invoked as a
result of memory demand from all sorts of processes, though rarely the
process oom.  I don't think this is a problem ... it is just that memory
demand comes from all processes, especially when they have already been
trimmed bare.

Several reports were around of OOM during olpc-update.  Running
olpc-update as root inside Terminal ... if an OOM occurs, then the
non-root components that started olpc-update (X, Sugar, Terminal) are
killed.

-- 
James Cameron    mailto:quozl at us.netrek.org     http://quozl.netrek.org/



More information about the Devel mailing list