#4184 BLOC First D: JFFS2 Dirent Anomaly

Zarro Boogs per Child bugtracker at laptop.org
Wed Oct 17 12:50:24 EDT 2007


#4184: JFFS2 Dirent Anomaly
--------------------------------+-------------------------------------------
  Reporter:  wmb at firmworks.com  |       Owner:  mstone                
      Type:  defect             |      Status:  new                   
  Priority:  blocker            |   Milestone:  First Deployment, V1.0
 Component:  kernel             |     Version:                        
Resolution:                     |    Keywords:                        
  Verified:  0                  |  
--------------------------------+-------------------------------------------

Comment(by wmb at firmworks.com):

 I have found the problem on my stock 616 installation too.  The easy way
 to check for the symptom is:

  ls /ls /versions/run/616/lib/modules/2.6.22-20071009.1.olpc.ec54a65da6de0
 43/kernel/drivers/input/

 If "joydev.ko" appears twice, the problem exists.  The second copy of
 joydev.ko is actually "joydev.ko\0", but you can't see the null in the
 name.

 Here is the sequence of operations that leads to the problem:

 a) The Activation startup process creates "/versions/run/616" as a
 "shallow copy" (tree of links) of "/versions/pristine/616" and reboots
 with "versions/run/616" as the virtual root.

 b) During the execution of /etc/rc.d/rc.sysinit (Linux startup), around
 the time that /sbin/start_udev is running, several kernel modules are
 loaded, specifically cs5535_gpio, serio_raw, psmouse, ieee80211_crypt,
 ieee80211, libertas, usb8xxx, joydev, mousedev, and i2c_dev.

 c) For some unknown reason, reading those modules causes "copy-on-write
 link breakage", i.e. the vserver code decides that it cannot leave those
 modules as links to the pristine copies, but rather must create writable
 copies of them.

 d) For some other unknown reason, the copy process appears to happen twice
 for joydev.ko (it happens only once for the other modules), at roughly the
 same time (same 1-second timestamp on the JFFS2 dirents, dirents are close
 together in the same JFFS2 erase block, with only the mousedev.ko dirents
 intervening).

 e) The copy-on-write process inside vserver involves creating a temp file
 named "joydev.ko\251" (octal 251 is hex a9), copying the pristine file to
 it, then renaming the temp file to "joydev.ko".  See
 fs/namei.c:cow_break_link().

 f) The second time that this happens on the same file, the temp file is
 renamed not to "joydev.ko", but instead to "joydev.ko\0", with a spurious
 extra null on the end.

 g) The JFFS2 garbage collector cannot scavenge the file with the null on
 the end, because it sometimes uses strcmp() to match filenames (thus
 treating the null as a terminator), and other times uses a hash over the
 entire string length (thus including the null in the name).  That confuses
 it about whether or not the node has been scavenged.  The end result is
 that, instead of scavenging "joydev.ko\0", it creates an "infinite" number
 of copies of the dirent, with the copies named "joydev.ko" without the
 null.

 h) There are four aspects to this bug, fixing any of which would probably
 make the bad effect (JFFS2 filling up with garbage) go away:

 1) There is a bug in vserver whereby "simultaneous" attempts to break the
 same link race, and the second one to finish creates a bad name.  Bertl
 has verified that this race exists, using a different, simpler test
 script.  Fixing that bug would eliminate the bogus filename, thus the
 JFFS2 bug would not be triggered. (Note that the race does not always
 result in the appending of a null to the file name; sometimes the name
 gets garbled in other ways.)

 2) There should be no need to break the links for these modules since
 there are only read, not written.  Eliminating that link-breaking would
 suppress this particular manifestation of the problem - but the race
 condition would still exist and might bite us in some other context.

 3) We don't need the joydev module anyway, so it should be eliminated from
 the kernel build.  That too would hide the problem for now, but it might
 come back later in another context.

 4) JFFS2 garbage collection should be improved to be stable in the face of
 such malformed filenames.  Either that, or JFFS2 should refuse to create
 dirents with embedded or trailing nulls, since they cannot be garbage
 collected successfully.

 5) It would be interesting to know why the link-breaking for joydev.ko
 happens twice. It could be due to asynchronous modloading, or something
 more subtle.

 In any case, problem (1) must be fixed, because it could cause filesystem
 corruption of many different flavors, including files going "missing"
 because their names got mangled.

-- 
Ticket URL: <https://dev.laptop.org/ticket/4184#comment:11>
One Laptop Per Child <https://dev.laptop.org>
OLPC bug tracking system



More information about the Bugs mailing list