[Trac #190] JFFS2 power failure handling
Zarro Boogs per Child
bugtracker at laptop.org
Wed Oct 18 06:06:23 EDT 2006
#190: JFFS2 power failure handling
--------------------+-------------------------------------------------------
Reporter: dwmw2 | Owner: dwmw2
Type: defect | Status: new
Priority: normal | Milestone: CTest
Component: kernel | Keywords:
--------------------+-------------------------------------------------------
A pathological pattern of power cycles can cause JFFS2 to entirely run out
of space to perform garbage collection. For example, hook up an automatic
power switch triggered by the NAND controller and make it cut power in the
middle of each page write, unless it's the first page in an eraseblock.
Thus, each erase block will have only a _single_ valid node and thus can't
be deleted -- but the rest of the space in each block will be wasted by
failed write attempts. And then you'll have no free space left into which
you can garbage collect, and you see nasty messages like ‘Argh. No free
space left for GC’
There are ways to deal with this -- you can reserve an extra eraseblock
for "panic" garbage collection, where you don't GC normally, but instead
you just _copy_ nodes intact from the least-used eraseblock, just as we
copy REF_PRISTINE nodes. If power is interrupted during this 'panic GC',
we can erase the target block and try again. If we manage to write the
whole thing, then we can erase the _source_ block and it becomes our panic
block for the future, while we still have the tail end of the block we've
just written which is available to allow normal GC to proceed.
This relies on being able to recover properly after the power failure --
we have to:
1. Know that we're in panic mode, which is easy enough because there are
no free eraseblocks.
2. Find the block which we were copying into, which is also relatively
easy because it contains only nodes which are _duplicates_ of other nodes
on the medium, with precisely the same version number. Currently we
silently drop one of the duplicates, but we can change that to keep a
record so that we can find blocks which are entirely made up of such
nodes.
This also helps us deal with the uncertainty around garbage collection,
where it was never mathematically proven that we were reserving enough
space to allow GC to proceed under all circumstances. The ‘panic mode’
should allow us to avoid getting stuck, as has occasionally been observed
to happen.
--
Ticket URL: <http://dev.laptop.org/ticket/190>
One Laptop Per Child <http://laptop.org/>
More information about the Devel
mailing list