NAND out of space crash

John Watlington wad at laptop.org
Mon Jul 21 15:57:25 EDT 2008


It sounds like you are working on the root causes.
Tday I'm hanging out with the logistics/repair team,
and the problem is worse than I thought this morning.
They are being innundated with "new" problems caused
by full disk (but weren't really aware that was the cause.)

Since fixes in 8.2 won't help them for months, they need
the short term fix (c).   I will talk to Fiorella and her team
about progress on that tmw.

They also need a way of repairing these in the field.
Mailing them back to LATU for reflashing is costing a fortune.
Over 55% of their returns for repair are fixed by
reflashing/reactivating.

The problem with a teacher reflashing them are two:
1) The teachers don't have activation keys for the machines,
   and Uruguay doesn't want to start giving them out.
2) Currently, there is no monolithic image for Uruguay
   (I was unaware of this, but they say that first they reflash, then
   they activate, then they install the Uruguay specific scripts.)

It seems like we should be able to produce a upgrade and
customize key that does this in one step, and preserves the
activation key for the laptop.

Thoughts ?
wad

On Jul 21, 2008, at 2:39 PM, C. Scott Ananian wrote:

> On Mon, Jul 21, 2008 at 12:52 PM, Jim Gettys <jg at laptop.org> wrote:
>> There are two issues here that we should be sure to not intertwingle:
>>
>> 1) whatever behavior Sugar may have when low/out of space, during
>> operation, or at boot time.
>
> A number of independent issues here:
>  a) the initscripts should be sure to unfreeze the dcon if/when X
> fails to start.  This ensures that the system is obviously recoverable
> (you can recover by rebooting with the check key held down, but this
> is not obvious!).
>  b) sugar should, ideally, start even if flash is full.   It is
> currently failing when writing to ~olpc/.boot_time or some such, and
> crashing.
>  c) once sugar starts, there should be a message indicating that the
> NAND is critically full.
>  d) trying to save new content to the journal should also give an
> obvious message that the NAND is full.
>  e) removing content from the journal should work even if NAND is  
> full.
>
> I think (a), (b), and (e) are critical for 8.2.  (c) is being handled
> independently by Uruguay, and (c) and (d) should be targets for 9.1.
>
>> 2) JFFS2's behavior when the file system is almost full.  When it  
>> gets
>> almost full, it can spend all its time trying to garbage collect, and
>> you can lose completely (the system sort of gets the "slows", and  
>> grinds
>> to a halt).
>>
>> As to 2), there are patches done by Nokia (deployed on the N800 and
>> similar devices) that reserve some extra space and report out of  
>> space
>> before the system "gets the slows".  These are in Dave's incoming  
>> queue
>> to merge into JFFS2 the last I heard.  I don't know if he's merged  
>> them.
>
> These are less critical, IMO.  I have filled up NAND, and "the slows"
> are not debilitating.  The issues above are. We should encourage Dave
> to fix this issue and the other known JFFS2 bugs (trac #6480, for
> instance)  -- or get dsaxena to do so -- for 9.1.
>  --scott
>
> -- 
>  ( http://cscott.net/ )




More information about the Devel mailing list