[RFC] Four solutions to NAND fillup

Erik Garrison erik at laptop.org
Thu Jul 24 19:14:47 EDT 2008


OLPC Developers,

Greg asked me to write a report of the solutions to the NAND fillup
problem.  We need to present a set of solutions to LATU as soon as
possible so that they can establish what solution(s) are tenable for
their deployment.

Several provisional solutions are in the works.  I describe and
evaluate them here.  I seek comments, suggestions, and clarifications.


Description:

-= 1. Boot on a read-only filesystem =-

For the 8.2.0 release our developers are working on getting the system
to boot and run without any writes to the underlying filesystem.  This
allows us to reach a state in which the user has access to the journal
and can start deleting items.


-= 2. Automatically delete files from the datastore =-

Chris Ball has produced a patch which will delete items from the
datastore when we encounter a NAND-full situation at boot.  It builds
a list of files in the datastore, sorts them in order of size, and
deletes them from largest to smallest until the system's free space
falls below some threshold.


-= 3. Boot on a union-mounted writeable filesystem =-

A union mount (http://en.wikipedia.org/wiki/Union_mount) can be used
to unify a read/write filesystem (typically a ram-backed tmpfs) and a
read-only filesystem (such as a CD-R or a full jffs2 partition) into a
single writeable filesystem.  This arrangement allows us to boot Sugar
and run applications without any code-level modifications.


-= 4. Store a large file on jffs2 and delete when space is low =-

This solution is roughly equivalent to the aufs solution except that
boot is guaranteed when NAND is full by removing a large buffer file
stored in the jffs2 root partition.



Discussion:

1. Booting Sugar even on a read-only filesystem is a development goal
for 8.2.0 (and as I understand has been mostly achieved for that
purpose), but for Uruguay it may not be possible to push such a
complex set of changes from our development branch into the builds
which they have deployed.

2. This solution is by far the simplest and most sure to immediately
resolve the problem.  However, automatically deleting files seems to
me to be at least a user-confusing solution to the NAND fillup
problem.  We are teaching children what to expect from computers.
Absolute breakage due to storage media exhaustion is intelligible, but
apparently random patterns of file deletion may confuse users.  More
problematically, the only metric which we can establish for automatic
deletion is size, and this may bias deletions toward specific
activities, perhaps ones needed or specifically desired by users.
Despite these concerns, we must acknowledge that such a change is
certainly within the realm of feasibility for Uruguay's deployment and
will at least resolve a mounting support problem caused by NAND
fillup.  In my opinion, this should be considered a viable failsafe
solution.

3.  This solution has been tested, and verified to boot Sugar and
launch activities on an otherwise unmodified 656 system with a full
NAND.  To boot on a union-mount, all that is required is the addition
of the aufs module (Another UnionFS) to the initramfs and a patch to
the initscripts to check if the system has passed the NAND fill
threshold.  A small amount of work is still required to update the
Journal to delete items from the jffs2 partition when the system is
running on a union mount.  Further work could be completed to force
the user to delete items, but it may be sufficient to simply alert the
user to the fact that the system will not save any data between
reboots until they delete enough items from their journal.  We will
also have to convey some information to the user about how close they
are to the fillup threshold.

4. This solution has not yet been tested, but it seems likely to work,
to similar effect as #3.  It presents us with a slighly different set
of issues; namely, we must manage the episodic creation and deletion
of a large file.  We also must forbid the user from creating more data
while the buffer file is not in existence, lest we decrease the amount
of buffer available or end up with an unbootable system.  This
requires a much more stringent recovery console than #3, such as an X
session only running an instance of the journal activity.
Furthermore, depending on the size of the file, some percentage of
deployed systems may fall into NAND-full territory during upgrade.



Erik



More information about the Devel mailing list