[OLPC-devel] NAND Flash Performance: IDE Interface Hack

Sun Jun 18 07:27:05 EDT 2006

Mark,

On Sat, 2006-06-17 at 13:14 -0700, Mark J. Foster wrote:
> I'm moving forward on a new ASIC that I call CAFE.  I haven't been 
> posting too much about it yet, because if we go this route, it would 
> probably also contain a couple of additional peripheral interfaces that 
> we're not ready to commit to for inclusion in the machine just yet.  If 
> these interfaces are included, though, combining the NAND controller 
> with the new interfaces yields excellent pricing.

Let me know, when you need input on the NAND features.

> The progress in this regard is actually quite promising, with a vendor 
> that wants to make it happen, but we're obviously facing a heinously 
> tight schedule.  Within a few days, we should know if this is a real 
> solution or not.  Performance-wise, this solution would connect via the 
> PCI bus, so it should deliver near-optimal performance for a 33 MHz PCI 
> bus (66 MHz isn't worth the power tradeoff).

I think thats ok, as long as we can do DMA. An interesting question is,
whether we can achive an overlapping operation of DMA and readout. That
would give us the optimal read speed. The flow would be:

CPU			Controller
Read command		
			Setup readout
Wait for ready/busy	Wait for ready/busy
Start DMA		Start readout
			When readout treshold reached, start DMA
Wait for DMA

We need one clock cycle for 1 byte to read out, when we can control
the /RE pulse width magically in the controller, so the treshold is 3/4
of the data to read (2048 + 64 bytes) * 3/4 =~ 1660 bytes

We need to readout the ECC code and status which adds another 36 bytes,
so with a total of 2148 bytes we end up with 540 cylces DMA in the best
case.

The total read time for one page would then accumulate to:

20µs command ready time
50us read ahaead
17us DMA
-----------------------
87us / page

So the theoretical maximum raw transfer speed would be something around
23MiB/s. Adding the memcpy, which brings the stuff into the cache and
the ECC checks we might end up with abot 16MiB/sec real - given that my
math is halfways correct :)

	tglx