File systems usage patterns and NAND lifetime

Fri Oct 10 02:17:43 EDT 2008

I attended and Embedded Linux Conference [1] last week  at which I
saw a great talk on "Managing NAND Over A Product Lifecycle" [2].

The speaker presented the case of determining whether a choosen
NAND HW and SW combination will survive the estimated lifecycle 
of a product. As an example, he used a GPS device his firm worked
on in which they had some very specific usage data such as:

- The average runtime for the device is 4 hours a day, during
  which we will see 100bytes/second of application logs
  written, 2300 bytes written for the addressbook, 
  1KiB/second used for temporary storage as mapes are
  decompressed.

- The user will on average update the map data from his/her
  PC every such that it requires 3GiB writes/quarter.

- OS and application updates require 32MiB/quarter.

There were many other data points, please refer to the slides
for full details.

With this data, they were able to  generate an I/O model of the 
application that was used to drive nandsim, an in-kernel NAND device
simulator. By doing this, they were to replicate the product's expected 
lifetime before user replacement (3 years) in a matter of a few days.
nandsim + the UBI reporting mechanisms were used to generate detailed 
reports of the wear leveling behaviour of the system, how the filesystem 
reacted to bitflips, bad pages, etc. Using this they were able to 
determine how to layout their filesystem and to meet the lifecylce 
requirement. After this was done, they used the same I/O model was used 
to rapidly drive a real device toward failure modes to see how it
would react. If it didn't survive for the expected lifecycle,
they could analyze the data and figure out what settings to tweak.

In this talk I also learned about the MLC NAND property of "read 
disturbance", where a read to one page can cause a bit-flip on an 
adjacent page.

I found the talk fascinating and it has made me wonder if we 
have any idea what our typical deployed usage patterns might 
look like?  How often does the journal write to disk and how 
big is each write write?  How often do systems reboot and 
require a full filesystem read vs simply suspending/resuming?

Related to this topicm I am also wondering  what is the expected 
usable life of the XO? We're used to product replacement every few 
years, sometimes faster depending on the product segment, but I 
doubt countries that are investing $millions expect to only get 
2-3 years of use out of the XO. 

~Deepak

[1] http://mvista.com/vision/
[2] http://www.mvista.com/download/fetchdoc.php?docid=329

-- 
Deepak Saxena - Kernel Developer - dsaxena at laptop.org