[Trac #190] JFFS2 power failure handling

Zarro Boogs per Child bugtracker at laptop.org
Wed Oct 18 06:06:23 EDT 2006


#190: JFFS2 power failure handling
--------------------+-------------------------------------------------------
 Reporter:  dwmw2   |       Owner:  dwmw2
     Type:  defect  |      Status:  new  
 Priority:  normal  |   Milestone:  CTest
Component:  kernel  |    Keywords:       
--------------------+-------------------------------------------------------
 A pathological pattern of power cycles can cause JFFS2 to entirely run out
 of space to perform garbage collection. For example, hook up an automatic
 power switch triggered by the NAND controller and make it cut power in the
 middle of each page write, unless it's the first page in an eraseblock.

 Thus, each erase block will have only a _single_ valid node and thus can't
 be deleted -- but the rest of the space in each block will be wasted by
 failed write attempts. And then you'll have no free space left into which
 you can garbage collect, and you see nasty messages like ‘Argh. No free
 space left for GC’

 There are ways to deal with this -- you can reserve an extra eraseblock
 for "panic" garbage collection, where you don't GC normally, but instead
 you just _copy_ nodes intact from the least-used eraseblock, just as we
 copy REF_PRISTINE nodes. If power is interrupted during this 'panic GC',
 we can erase the target block and try again. If we manage to write the
 whole thing, then we can erase the _source_ block and it becomes our panic
 block for the future, while we still have the tail end of the block we've
 just written which is available to allow normal GC to proceed.

 This relies on being able to recover properly after the power failure --
 we have to:

  1. Know that we're in panic mode, which is easy enough because there are
 no free eraseblocks.
  2. Find the block which we were copying into, which is also relatively
 easy because it contains only nodes which are _duplicates_ of other nodes
 on the medium, with precisely the same version number. Currently we
 silently drop one of the duplicates, but we can change that to keep a
 record so that we can find blocks which are entirely made up of such
 nodes.

 This also helps us deal with the uncertainty around garbage collection,
 where it was never mathematically proven that we were reserving enough
 space to allow GC to proceed under all circumstances. The ‘panic mode’
 should allow us to avoid getting stuck, as has occasionally been observed
 to happen.

-- 
Ticket URL: <http://dev.laptop.org/ticket/190>
One Laptop Per Child <http://laptop.org/>



More information about the Devel mailing list