#4184 BLOC First D: JFFS2 Dirent Anomaly
Zarro Boogs per Child
bugtracker at laptop.org
Wed Oct 17 12:50:24 EDT 2007
#4184: JFFS2 Dirent Anomaly
--------------------------------+-------------------------------------------
Reporter: wmb at firmworks.com | Owner: mstone
Type: defect | Status: new
Priority: blocker | Milestone: First Deployment, V1.0
Component: kernel | Version:
Resolution: | Keywords:
Verified: 0 |
--------------------------------+-------------------------------------------
Comment(by wmb at firmworks.com):
I have found the problem on my stock 616 installation too. The easy way
to check for the symptom is:
ls /ls /versions/run/616/lib/modules/2.6.22-20071009.1.olpc.ec54a65da6de0
43/kernel/drivers/input/
If "joydev.ko" appears twice, the problem exists. The second copy of
joydev.ko is actually "joydev.ko\0", but you can't see the null in the
name.
Here is the sequence of operations that leads to the problem:
a) The Activation startup process creates "/versions/run/616" as a
"shallow copy" (tree of links) of "/versions/pristine/616" and reboots
with "versions/run/616" as the virtual root.
b) During the execution of /etc/rc.d/rc.sysinit (Linux startup), around
the time that /sbin/start_udev is running, several kernel modules are
loaded, specifically cs5535_gpio, serio_raw, psmouse, ieee80211_crypt,
ieee80211, libertas, usb8xxx, joydev, mousedev, and i2c_dev.
c) For some unknown reason, reading those modules causes "copy-on-write
link breakage", i.e. the vserver code decides that it cannot leave those
modules as links to the pristine copies, but rather must create writable
copies of them.
d) For some other unknown reason, the copy process appears to happen twice
for joydev.ko (it happens only once for the other modules), at roughly the
same time (same 1-second timestamp on the JFFS2 dirents, dirents are close
together in the same JFFS2 erase block, with only the mousedev.ko dirents
intervening).
e) The copy-on-write process inside vserver involves creating a temp file
named "joydev.ko\251" (octal 251 is hex a9), copying the pristine file to
it, then renaming the temp file to "joydev.ko". See
fs/namei.c:cow_break_link().
f) The second time that this happens on the same file, the temp file is
renamed not to "joydev.ko", but instead to "joydev.ko\0", with a spurious
extra null on the end.
g) The JFFS2 garbage collector cannot scavenge the file with the null on
the end, because it sometimes uses strcmp() to match filenames (thus
treating the null as a terminator), and other times uses a hash over the
entire string length (thus including the null in the name). That confuses
it about whether or not the node has been scavenged. The end result is
that, instead of scavenging "joydev.ko\0", it creates an "infinite" number
of copies of the dirent, with the copies named "joydev.ko" without the
null.
h) There are four aspects to this bug, fixing any of which would probably
make the bad effect (JFFS2 filling up with garbage) go away:
1) There is a bug in vserver whereby "simultaneous" attempts to break the
same link race, and the second one to finish creates a bad name. Bertl
has verified that this race exists, using a different, simpler test
script. Fixing that bug would eliminate the bogus filename, thus the
JFFS2 bug would not be triggered. (Note that the race does not always
result in the appending of a null to the file name; sometimes the name
gets garbled in other ways.)
2) There should be no need to break the links for these modules since
there are only read, not written. Eliminating that link-breaking would
suppress this particular manifestation of the problem - but the race
condition would still exist and might bite us in some other context.
3) We don't need the joydev module anyway, so it should be eliminated from
the kernel build. That too would hide the problem for now, but it might
come back later in another context.
4) JFFS2 garbage collection should be improved to be stable in the face of
such malformed filenames. Either that, or JFFS2 should refuse to create
dirents with embedded or trailing nulls, since they cannot be garbage
collected successfully.
5) It would be interesting to know why the link-breaking for joydev.ko
happens twice. It could be due to asynchronous modloading, or something
more subtle.
In any case, problem (1) must be fixed, because it could cause filesystem
corruption of many different flavors, including files going "missing"
because their names got mangled.
--
Ticket URL: <https://dev.laptop.org/ticket/4184#comment:11>
One Laptop Per Child <https://dev.laptop.org>
OLPC bug tracking system
More information about the Bugs
mailing list