Opportunity for speedup
Mitch Bradley
wmb at laptop.org
Thu Feb 19 13:49:56 EST 2009
Bobby Powers wrote:
> On Thu, Feb 19, 2009 at 1:22 PM, C. Scott Ananian <cscott at laptop.org> wrote:
>
>> I'd suggest just uncompressing the various image files and re-timing
>> as a start. The initial implementation was uncompressed, but people
>> complained about space usage on the emulator images (which are
>> uncompressed). The current code supports both uncompressed and
>> compressed image formats. For uncompressed images, putting the bits
>> on the screen is an mmap and memcpy, so I can't imagine any
>> implementation being faster than that (it's possible, of course, that
>> what's stealing CPU is the shell's invocation of the client program;
>> recoding just that little part in C should be trivial, since it does
>> nothing but write to a socket IIRC.)
>>
>> Anyway, further benchmarking of the current implementation is probably
>> worthwhile before a complete reimplementation is called for. But if
>> you want to reimplement it from scratch, go nuts.
>> --scott
>>
>
> I already re-implemented it - it was a fun optimization project and
> introduction to lower level systems programming. Using Mitch's D565
> format to keep track of only the parts of the image that change cut
> down the implementation size significantly. Its now only 2
> uncompressed images (frame00.565 and ul-warning.565), and <300KB of
> differences for the animation sequence. I understand reads from video
> memory (which I think is what the framebuffer is?) can be extremely
> slow, so it could turn out faster to open a D565 file, mmap it and
> mcpy the several tens of kilobytes of differences to the framebuffer
> than it is to read those differences from one part of video memory to
> another.
>
It is easy to measure just how "slow" video memory reads are. Lets test
256K (0x40000):
ok screen-ih iselect
ok t( frame-buffer-adr frame-buffer-adr 4.0000 + 4.0000 move )t
56,272 uS
Conversely, for memory to frame buffer:
ok t( load-base frame-buffer-adr 4.0000 + 4.0000 move )t
05,407 uS
So frame buffer reads are slower. But the total amount of time that we
have "wasted" is 50 milliseconds over the whole procedure. I suspect
that it will be difficult to come up with a way to save those 50 mS
that doesn't cost a similar amount of time in setup.
For ongoing stuff like run-time graphics operations, it's clearly
important to avoid "slow" operations, but in this case, we are trading
off slow FB accesses against the complexity of maintaining persistent
state in main memory.
> This is where benchmarking should give some clearer answers.
>
> yours,
> Bobby
>
More information about the Devel
mailing list