idea for running out of RAM

Sun Nov 2 01:07:40 EST 2008

On Sat, Nov 1, 2008 at 11:01 AM, Benjamin M. Schwartz
<bmschwar at fas.harvard.edu> wrote:
> Albert Cahalan wrote:

>> Memory reservations are a different beast entirely. Running
>> out of memory becomes approximately impossible because
>> the user is blocked from starting too many activities.
>
> This seems like a silly statement to me.  Almost every activity on the XO
> is capable of exceeding the hardware memory limit all on its own.

If so, then most are broken. Tux Paint doesn't suffer from
this defect.

The only semi-respectible excuse is that the activity accepts
arbitrary input. The web browser is the obvious example.
It's only semi-respectible because the activity can often have
an internal limit (enforced in the easy code paths) for this,
and because partial document rendering can prevent activities
like Read from having this problem.

If the activity can not be modified to limit itself, then it can't
legitimately specify a reservation. Sugar can make these
badly behaved activities run by themselves.

> Per-activity memory reservations are also per-activity limits, and they
> are only safe if those limits are set higher than the maximum amount of
> memory required by that activity, and that maximum value is simply far too
> large.

The difference is that activities never get killed under a
reservation system unless one is malicious or horribly buggy.

Under a limit system, activities will die. It's unacceptable.

> I like the idea of memory reservations, and they were part of the
> original design, but if we set them high enough to be safe, we would have
> a single-tasking (and maybe zero-tasking!) operating system.

No, although there are massive usability advantages for the
elimination of being able to run multiple things at once.
When a kid runs multiple activities, 100% of the time it was
unintentional. The kid got confused, probably because the
damn frame popped up under his mouse and stole a click.

> I should also be clear that I don't think Activities should receive the
> low-mem signal.  I think Sugar should catch the low-mem signal, so that it
> can attempt to do something smarter than the OOM killer because it knows
> much more about the system.  For example, it can choose to kill the
> activity instance that is using the most memory, or the
> least-recently-used activity instance, or even the instance that has most
> recently saved its state.

Destroying the user's work by killing an activity: FAIL

> This works especially well if we also use the knobs on the OOM killer.
> For example, the low-mem signal, after pausing all other processes, could
> cause Sugar to (1) select an activity to kill, (2) set that activity's
> oomadj parameter to make sure that it will be the first one killed if we
> hit OOM (3) ask that instance to save its state to the datastore, (4)
> close the activity instance, and (5) pop up a notification to the user
> about what just happened.

In a fit of rage, the kid throws his XO out the window. It just ate
his work for the eleventy-seventh time today.

Lots of things are wrong with that.

You may kill an activity that could have survived; there is
no good way to tell when OOM will be hit until you hit it.

Setting oomadj doesn't prevent the laptop from getting so
slow that the user decides to hard reboot.

There is no reasonable way to "ask" an activity to save state.
People don't write perfectly modeless code with atomic
operations on a database.

Since we can often determine an upper bound for the RAM usage
of an activity, we can trivially determine if a given set of activities
is capable of causing OOM. If we determine that starting a new
activity would place the XO in danger of OOM, then there is no
excuse for allowing that activity to start.

> The cgroups stuff could also help here, since the OOM killer by default
> thinks in terms of processes, but each Activity can be multiple processes.

That would cause activities to die. Work is lost. FAIL