9.1 Proposal: Top five performance problems

Mon Oct 27 11:28:21 EDT 2008

On 26/10/08 14:21 -0400, Erik Garrison wrote:
> On Fri, Oct 24, 2008 at 6:36 PM, Jordan Crouse <jordan.crouse at amd.com> wrote:
> > On 25/10/08 00:00 +0200, NoiseEHC wrote:
> >> The Geode X drive copyes every bit of data to the command ring buffer by
> >> using the CPU so that is sure that those "almost no CPU cycles" thing is
> >> at least a bit stretch... :) According to Jordan Crouse it will not be
> >> better but he was not too concrete so in the end I am not sure what he
> >> was really talking about, see:
> >> http://lists.laptop.org/pipermail/devel/2008-May/014797.html
> >
> > Indeed - many CPU cycles are used during compositing.  There is a lot of
> > math that happens to generate the masks and other collateral to render
> > the alpha icon on the screen.  The performance savings in the composite
> > code comes from not having to read video memory to get the src pixel
> > for the alpha operation(s).  That performance savings is already available
> > in the X driver today.
> 
> Ah!
> 
> So what work needs to be done to realize these performance savings?
> Or are you saying that we can already getting them by using composite?
>  Or by another method?

You mostly have them now.  In fact, you have had them in the driver
for the better part of a year and a half.  We don't support all
composite operations and I'm not even going to begin to pretend that
there aren't bugs all over the place, but for the most part you should
be already experencing whatever gains the GPU can give you.

> Also, here:
> 
> > The performance savings in the composite
> > code comes from not having to read video memory to get the src pixel
> > for the alpha operation(s).
> 
> Do you mean "not having to generate the video memory to get the src
> pixel"?  By not asking applications to redraw themselves aren't we
> saving CPU cycles?

No, I mean what I said.  An alpha blend operation requires three inputs -
the source color, the destination color and the alpha value.  In order
to do the alpha operation in system memory, you may need to read the
destination color from video memory, since it could have been calculated
as part of another operation.  Due to the way that the video memory is
cached, it is painfully slow for the system to read from video memory.
The GPU helps by doing the alpha blending operation in hardware.  It only
needs the alpha value and the source color, which we can readily provide
from the X server.  It then performs the operation directly on video
memory.  This saves CPU cycles from not having to do the alpha blending math
but mainly because the processor doesn't need to stall while reading the
video memory.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.