NAND full issue
Kimberley Quirk
kim at laptop.org
Fri Jul 25 18:40:14 EDT 2008
NAND Full Issue
Attending: Greg, Michael, Joe, Erik, Charlie, Chris, Kim
We discussed the problem recently uncovered in Uruguay, the solutions
and suggestions that have been posted and came up with a proposal for
moving forward.
Problem statement: bug 7587
With build 656, when the file system is completely full the laptop
will not boot. Currently Uruguay has to ship the laptops back to
repair centers causing both shipping costs and downtime costs.
Please see below for the 5 types of avoidance, recovery, build image
solutions and the bug fix solution. (Most of these have been discussed
in some detail on other threads).
This proposal addresses the problem for Uruguay a little differently
than for other build 656 customers as Uruguay is already diverged from
our code base. They may want a more elaborate solution that they can
test and deploy at their own pace.
OLPC's response is "Failsafe" for 656, per703, and 8.1.2; and a formal
bug fix for 8.2 going forward:
"Failsafe" OS - includes the "Automatic Free Space" recovery in a
build image. This works for laptops that are already refusing to boot
as well as for preventing the non-boot problem. On boot up, this will
check for free NAND and if there isn't enough to boot, it will display
a message that it is deleting files, and it will remove the largest
file(s) until 50M is available and then finish booting. This can be
delivered on a USB stick. Each country technical liaison can decide if
they want to update all laptops, or wait until laptops see the problem
(which could be many months). It should also be incorporated in Peru's
build (703), which we need to deliver early next week, so we can avoid
the problem for 100k laptops.
The formal bug fix with better notifications and the ability for the
user to chose what to delete will be described in 7587; and will be
delivered in 8.2.
Uruguay:
Erik is working with Uruguay on the solution described as "Union
Mount" below. It is important that Uruguay own this bug fix themselves
and can maintain it as needed, test it to their satisfaction, decide
how to distribute it. This can be delivered as a USB or wireless
download. Uruguay also has the choice to use the options supported by
OLPC above.
Thoughts?
Kim
-------
AVOIDANCE:
If the students /teachers had a regime of deleting files, that might
avoid the problem.
In Uruguay they are capable of displaying a dialog box at 85% full;
use that to avoid the problem.
RECOVERY SOLUTIONS -
Reflash the build via local USB stick - today this is not possible
because of their activation system.
Automatic Free Space:
Provide USB bootable build that would free space in some way. Can we
identify a class of things that we know can be deleted (like cracklib
dictionary of unsafe passwords, large activities). Add a note that a
delete is going to happen during boot.
BUILD SOLUTIONS -
Union Mount:
Erik's 'union mounting' (UFS) - check at boot if you are above
threshold. If so, mount the root as readonly and redirect write
requests. Nothing would write permanently. You can mark things for
delete, which will get deleted at the next shutdown. This can be
deployed ahead of time.
Failsafe:
Can be inserted in the build, include 'automatic free space'. It opens
the datastore and sorts by size, wants to find 50M, pops off the stack
deleting stuff from largest to smallest. Can it explain afterwards
what it has done or explain ahead of time what it might do. Provide
options for what to delete.
Big File:
At reboot, a big file is written and saves space for the case when you
can't boot. Seems like it isn't a great idea. Two step boot process -
every boot we check that there is a file of a good size; still should
have a GUI for deciding what to delete.
The Fix: (fix to 7587)
When the NAND is full, Sugar will boot but not be allowed to write. A
notification about space and inability to write needs to be displayed.
More information about the Devel
mailing list