[OLPC-devel] Re: Software action items and status
David Woodhouse
David at Woodhou.se
Fri Jun 9 09:51:49 EDT 2006
On Thu, 2006-06-08 at 23:28 -0400, Jim Gettys wrote:
> 1) Flash interface alternatives for slow flash reading. (dwmw2/jg)
> Dave and Jim to get together a list of possible alternatives here,
> so Mark can cost out and see what might be done to improve this.
Right... here are the basic alternatives:
1). Leave it as it is, using the CS5536 NAND controller. We get access
to the flash about an order of magnitude slower than it should be;
2.7MiB/s instead of 26MiB/s.
2). Leave the hardware as it is but switch to 66MHz PCI. We ought to get
about 3.5MiB/s from it -- still fairly crap. And there are power
consumption implications of switching to 66MHz -- do we know
precisely what different that makes?
3). The IDE MDMA hack that Tom has been looking at. The documentation on
the IDE timing MSR (ATAC_CH0D0_DMA) seems to be explicit that it's
66MHz cycles, so by setting tKR and tDR both to zero (i.e. one cycle
for each of 'active' and 'recovery' time) we ought to be able to
do 60ns per cycle.
Unfortunately, we get 16 bits of data for each IDE cycle and the
chip provides only 8 bits. We have to post-process the buffer,
picking out the alternate bytes that we actually want, and
discarding the line noise on the upper 8 bits of each transfer.
Tom -- does that look like an accurate summary? And am I right in
interpreting the docs as saying it's _always_ measured in 66MHz
cycles, even when PCI is at 33MHz?
We'd get just under 16MiB/s raw _buffer_ read speed from that, under
ideal conditions. That's not _quite_ a number we can compare
directly with the above 2.7MiBs and 3.5MiB/s figures, since it
doesn't include command time or anything like that, and neither does
it include picking the alternate bytes from the buffer when it's
arrived (which will also mean bringing it into dcache from RAM).
For comparison, raw _buffer_ read speed from the flash chip would
be about 38MiB/s. So we're still a way off the ideal.
It also doesn't account for that fact that we then have to do ECC
in software, and there are other 'interesting' details about the
abuse of the IDE interface which may slow us down. Tom should have
more details on precisely how this goes together for us, some time
soon.
4). Our own CPLD/FPGA
5). Our own ASIC
I'll bundle these together since I'm not in a position to make a
distinction between the two. That's about the up-front costs vs.
the per-unit costs, and the scheduling (and testing) constraints.
The technical issues from my PoV are very similar.
Basically, the idea is that we attach our own NAND flash
controller to replace the one in the CS5536. It's relatively
simple -- just a FIFO and some Reed-Solomon ECC calculation.
Thomas has a working implementation of this in a CPLD which
is freely licensed, and would just need adapting to interface
to our board. Getting it all working in a CPLD and then transferring
that to an ASIC should be relatively low-risk.
We ought to be able to get _very_ close to the full 26MiB/s read
speed of the chip (and also to its write speed) by doing this, and
I think it's the option that I'd prefer; cost permitting.
One question which I hope the AMD guys can help us with is _how_
we interface to the CPU. Do we do it as a PCI device? A GeodeLink
device? Something else?
One possibility is that we could follow on from Tom's idea of
abusing IDE. Except that with our own CPLD/FPGA/ASIC the whole thing
is far less Heath Robinson; we can have proper 16-bit transfers, we
can still have hardware ECC, etc. One advantage of this is that the
OLPC board is already laid out to allow a chip to sit between the
IDE interface and the NAND chip. We can even do UDMA -- Thomas says
that "it should be no big deal to hack the VHDL glue for that".
6). Abandon direct access to flash, and use the PS3002 which Quanta put
down pads for.
There are serious problems with the approach of letting something
like the PS3002 'fake' a normal 512-byte-sector block device using
NAND flash. By layering a 'normal' file system on top of the
internal pseudo-filesystem implemented by the PS3002, we end up with
a fairly inefficient mess. We don't even get to tell the PS3002 when
certain 'sectors' are no longer used by the file system, so it'll
continue to garbage collect those sectors; copying them around on
the NAND flash even though they're no longer used. We also end up
with two 'layers' of journalling, and the journal of something like
the ext3 file system would be horridly inefficient atop the PS3002's
"block device", as we repeatedly write sectors to the 'disk' twice
in quick succession rather than just dealing with the underlying
flash directly.
I think this is an ultimate last resort, but we'd probably be better
off with option #1 or #2 than this.
Overall, I think #5 is the better answer -- test it out with a CPLD and
then commit it to silicon. Comments?
--
dwmw2
More information about the Devel
mailing list