The "iGoogle bug"

Jordan Crouse jordan.crouse at amd.com
Tue Sep 18 13:18:28 EDT 2007


Okay - after some investigation and talking to the original author of
the Cimarron code, I have some answers.

> So the request gets through the amd_drv upload hook, and eventually
> we reach gp_color_bitmap_to_screen_blt(), whose purpose is to do
> the actual uploading:

The *real* purpose of the gp_color_bitmap_to_screen_blt() function is to
allow uploads from system memory with arbitrary ROPs.  Since we're only 
ever doing a straight source copy (0xCC), we really don't need all the
additional logic.  So Bernie's recommendation that we eliminate the
upload() function all together is the right solution, provided that the
default EXA function waits for the command buffer to clear first.
Otherwise, we'll need our own simple upload function that calls
gp_wait_until_idle() first.

The gp_color_bitmap_to_screen_blt() is indeed the way it is because of
virtual/physical translation concerns - if we can get around those, then
a blt would probably be faster, but its hard to do in userspace, as we
well know.

> Other code confirms the statement in this comment: GP3_MAX_COMMAND_SIZE
> is defined to be 8K.  However, this limit is arbitrary: I couldn't find
> anywhere in the databook a reason why the blitter couldn't copy more
> than 8K of data.  The actual limit is 64K of DWORDS.  I guess 8KB was
> just chosen as a reasonable waste of buffer space.

The 8K limit in the command buffer was based on the assumption that we
wouldn't be handling any pixmaps wider then the widest possible visible
line (1920 * 4 = 6480 bytes).  We can crank that up if we want to, but
it will have a direct effect on how many BLTs we can queue up unless we
crank up the amount of command buffer memory, which eats into our video
memory, and so on and so forth.  If we just move to a straight memcpy()
above, then this is no longer a going concern.

> Moreover, the GPU is very well capable of wrapping its command pointer at
> arbitrary positions, even in the middle of a command.  And so should the
> software.  I strongly disagree with the claim in the comment that this
> strategy simplifies anything.

This is incorrect.  The wrap bit tells the command buffer to wrap at the
end of the command, not in the middle of the command.

The bottom line is that you absolutely, positively do not want to get in
the business of messing with the command buffer functions - unless you want
to break a lot of stuff.  These functions have been carefully tuned to 
ensure that wrapping and other intelligence work well.  If you think yourself
suited to writing your own, there is a 100% chance of pain.

If you want to replace the WRITE_COMMAND* macros, feel free - but remember
that bitmaps almost always need to be copied line by line - few pixmaps
are stored contiguously.

So to summarize:

> Removing all of the asm wizardry (useless IMHO, maybe even
> counter-productive)

Remove whatever macros you think you need to - but remember, if it ain't
broke, don't fix it, and please send it to this list before putting it
anywhere near the production code.

> - Implementing access macros for the ring buffer using the normal,
>    plain wrapping policy of all ring buffers

NAK.  The ring buffers work, don't change them.

> - Killing the WRITE_COMMAND32() and WRITE_COMMANDSTRING32() abstractions.

if you want, keeping in mind what I said before.

> - Removing gp_declare_blt(), which needs to be called before starting
>   any blitting operation

Utter and absolutely NAK - this would break the entire system horribly.

> - Seeing if we can get the blitter to read source data directly from system
>   memory.  I'd be very surprised if there was no way to make it work
>   with virtual memory enabled, because, without such a mechanism, the
>   blitter would be less than fully useful.

You can't make it go with virtual memory, so NAK on this one too.

Jordan
-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.





More information about the Devel mailing list