File systems usage patterns and NAND lifetime

John Watlington wad at laptop.org
Fri Oct 10 10:04:13 EDT 2008


On Oct 10, 2008, at 2:17 AM, Deepak Saxena wrote:

> I attended and Embedded Linux Conference [1] last week  at which I
> saw a great talk on "Managing NAND Over A Product Lifecycle" [2].
>
> The speaker presented the case of determining whether a choosen
> NAND HW and SW combination will survive the estimated lifecycle
> of a product. As an example, he used a GPS device his firm worked
> on in which they had some very specific usage data such as:
>
> - The average runtime for the device is 4 hours a day, during
>   which we will see 100bytes/second of application logs
>   written, 2300 bytes written for the addressbook,
>   1KiB/second used for temporary storage as mapes are
>   decompressed.
>
> - The user will on average update the map data from his/her
>   PC every such that it requires 3GiB writes/quarter.
>
> - OS and application updates require 32MiB/quarter.
>
> There were many other data points, please refer to the slides
> for full details.
>
> With this data, they were able to  generate an I/O model of the
> application that was used to drive nandsim, an in-kernel NAND device
> simulator. By doing this, they were to replicate the product's  
> expected
> lifetime before user replacement (3 years) in a matter of a few days.
> nandsim + the UBI reporting mechanisms were used to generate detailed
> reports of the wear leveling behaviour of the system, how the  
> filesystem
> reacted to bitflips, bad pages, etc. Using this they were able to
> determine how to layout their filesystem and to meet the lifecylce
> requirement. After this was done, they used the same I/O model was  
> used
> to rapidly drive a real device toward failure modes to see how it
> would react. If it didn't survive for the expected lifecycle,
> they could analyze the data and figure out what settings to tweak.

Did he discuss trying to use the same for "managed NAND", where
there is no visibility into the wear levelling ?   Given the difference
in manuf. volumes, I fear that raw NAND will go the way of SCSI
disks (same basic storage medium, twice the price).

> In this talk I also learned about the MLC NAND property of "read
> disturbance", where a read to one page can cause a bit-flip on an
> adjacent page.

And write disturbs are a much bigger problem than in SLC NAND.

> I found the talk fascinating and it has made me wonder if we
> have any idea what our typical deployed usage patterns might
> look like?  How often does the journal write to disk and how
> big is each write write?  How often do systems reboot and
> require a full filesystem read vs simply suspending/resuming?

Unfortunately, the XO is a general purpose device.   There is a
huge difference in storage patterns between a kid that is just using
the laptop for reading and writing, and a kid that is trying to be
the next Stanley Kublick.

There was talk of getting "typical profiles" for disk usage for the XO,
and using them to drive some device testing (and I'm still willing to
do such).    Until such are available, I'm simulating the case of a
kid who fills up their laptop with data they don't want to delete, and
then keeps acquiring data and deleting it.  If naive wear levelling
(just using unused blocks) is used, this is a worst case for device  
wear.

At the same time, I'm trying to get a measurement for error rates.
So far, unless you count the repeated kernel crashes as errors,
we've only seen what looks like SD bus errors (the data in the
device didn't change, but a read returned erronous data.)

> Related to this topicm I am also wondering  what is the expected
> usable life of the XO? We're used to product replacement every few
> years, sometimes faster depending on the product segment, but I
> doubt countries that are investing $millions expect to only get
> 2-3 years of use out of the XO.

The desired lifetime of the XO is five years.   But all the components
that would wear out after that time (main battery, backlight, keyboard)
are easily replaceable.    If the NAND dies, it kills the most expensive
component.

Unfortunately, devices are getting less reliable faster than they are
growing in size.   Upcoming devices expect write/erase cycle lifetimes
of 5 - 10K instead of the 100K expected of our current SLC NAND.

Thanks for the links,
wad




More information about the Devel mailing list