Opportunity for speedup

Thu Feb 19 13:49:56 EST 2009

Bobby Powers wrote:
> On Thu, Feb 19, 2009 at 1:22 PM, C. Scott Ananian <cscott at laptop.org> wrote:
>   
>> I'd suggest just uncompressing the various image files and re-timing
>> as a start.  The initial implementation was uncompressed, but people
>> complained about space usage on the emulator images (which are
>> uncompressed).  The current code supports both uncompressed and
>> compressed image formats.  For uncompressed images, putting the bits
>> on the screen is an mmap and memcpy, so I can't imagine any
>> implementation being faster than that (it's possible, of course, that
>> what's stealing CPU is the shell's invocation of the client program;
>> recoding just that little part in C should be trivial, since it does
>> nothing but write to a socket IIRC.)
>>
>> Anyway, further benchmarking of the current implementation is probably
>> worthwhile before a complete reimplementation is called for.  But if
>> you want to reimplement it from scratch, go nuts.
>>  --scott
>>     
>
> I already re-implemented it - it was a fun optimization project and
> introduction to lower level systems programming.  Using Mitch's D565
> format to keep track of only the parts of the image that change cut
> down the implementation size significantly.  Its now only 2
> uncompressed images (frame00.565 and ul-warning.565), and <300KB of
> differences for the animation sequence.  I understand reads from video
> memory (which I think is what the framebuffer is?) can be extremely
> slow, so it could turn out faster to open a D565 file, mmap it and
> mcpy the several tens of kilobytes of differences to the framebuffer
> than it is to read those differences from one part of video memory to
> another.
>   

It is easy to measure just how "slow" video memory reads are.  Lets test 
256K (0x40000):

ok screen-ih iselect
ok t(  frame-buffer-adr   frame-buffer-adr 4.0000 +  4.0000  move  )t
56,272 uS

Conversely, for memory to frame buffer:

ok t(  load-base   frame-buffer-adr 4.0000 +  4.0000  move  )t
05,407 uS

So frame buffer reads are slower.  But the total amount of time that we 
have "wasted" is 50 milliseconds over the whole procedure.  I suspect 
that it  will be difficult to come up with a way to save those 50 mS 
that doesn't cost a similar amount of time in setup.

For ongoing stuff like run-time graphics operations, it's clearly 
important to avoid "slow" operations, but in this case, we are trading 
off slow FB accesses against the complexity of maintaining persistent 
state in main memory.

> This is where benchmarking should give some clearer answers.
>
> yours,
> Bobby
>