performance work

Greg Smith gregsmitholpc at gmail.com
Wed Dec 31 13:54:11 EST 2008


Hi All,

Great thread. I don't know the history but I completely agree with 
Jordan. A dedicated team of engineers takes at least two years of 
software to optimize available resources.

The main memory - video memory debate is age old. Until someone builds a 
better programming language and architecture for addressing the DCON 
frame buffer directly we need to optimize the architecture we have.

Moore's law is against us but we have 500,000 units in the field and can 
more than double that in 12 months (Moore be damned :-). Nail this 
problem quickly and we gain an industry-wide edge.

I collected related performance threads in the specification section here:
http://wiki.laptop.org/go/Feature_roadmap/General_UI_sluggishness

Did I miss instructions on how to determine which Cairo benchmarks are 
being called most often by sugar?

Can someone report how often the top 10 offenders below are called by 
using Sugar:
http://wiki.laptop.org/go/Feature_roadmap/General_UI_sluggishness#Test_data_comparison

Ask if its not clear. First steps may be documented here: 
http://wiki.laptop.org/go/Performance_tuning#Other

Our development bottleneck could be X-Windows (and Cairo) people. Can 
someone send an e-mail to the right list and ask for help?

Jordan told us which X functions he thinks will pay off. See 
http://wiki.laptop.org/go/Feature_roadmap/General_UI_sluggishness#X_optimization_suggestions

That's not asking for new functions, just calling well know ones. I'm 
optimistic compositing hooks will be a huge win....

Thanks,

Greg S

> Date: Wed, 31 Dec 2008 09:20:27 -0700
> From: Jordan Crouse <jordan at cosmicpenguin.net>
> Subject: Re: performance work
> To: Lerc at screamingduck.com
> Cc: devel at lists.laptop.org, greg at laptop.org
> Message-ID: <495B9BCB.2010505 at cosmicpenguin.net>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> Neil Graham wrote:
>> On Tue, 2008-12-30 at 20:41 -0700, Jordan Crouse wrote:
>>>> I'm curious as to why reads from video memory are so slow,  On standard
>>>> video cards it's slow because there is quite a division between the CPU
>>>> and the video memory,  but on the geode isn't the video memory shared in
>>>> the same SDRAM as Main memory. 
>>> It is, in that they share the same physical RAM chips, but they are 
>>> controlled by different entities - one is managed by the system memory 
>>> controller and the other is handled by the GPU.   At start up time, the 
>>> memory is carved up by the firmware, and after the top of system RAM is 
>>> established, video and system memory behave for all intents and purposes 
>>> like separate components.  Put simply, there is no way to directly 
>>> address video memory from the system memory.  Access to the video memory 
>>> has to happen via PCI cycles, and for obvious reasons the active video 
>>> region has the cache disabled, accounting for relatively slow readback.
>> That makes my brain melt, you can't address it even though it's on the
>> same chip!?!  Even as far back as the PCjr the deal was that sharing
>> video memory cost some performance due to taking turns with cycles but
>> it gave some back with easy access to the memory for all.   Has the
>> geode cunningly managed to provide a system that combines all the
>> disadvantages of separate memory with all the disadvantages of shared?
>>
>> One wonders what would happen if you wired some lines to the chips so
>> that the memory appeared in two places,  would you get access to the ram
>> (with the usual 'you pays your money, you takes your chances' caveats
>> about coherency)
>>
>> I'm not a hardware person, but that all just seems odd.
> 
> You are missing the point - this model wasn't designed so that the 
> system could somehow sneakily address video memory, it was designed so 
> that the system designer could eliminate the need for the added cost, 
> expense and real estate for a separate bank of memory chips.  See also
> http://en.wikipedia.org/wiki/Shared_Memory_Architecture.
> 
>>> That said, the read from memory performance is still worse  then you
>>> might expect - I never really got a good answer from
>>> the silicon guys as to why. 
>>>
>> being hit with the full sdram latency every access maybe?
>>
>> Is it feasible to try with caches enabled and require the software to
>> flush as needed.
> 
> Ask around - I don't think that you'll find anybody too keen on having 
> the X server execute a cache invalidate a half dozen times a second.
> 
> Anyway, you are getting distracted and solving the wrong problem.  You 
> should be more concerned about limiting the number of times that the X 
> server reads from video memory rather then worrying about how fast the 
> read is.
> 
> If I can rant for a second (and this isn't targeted at Neil 
> specifically, but just in general), but this is another in a list of 
> more or less hard constraints that the current XO design has. 
> Throughout the history of the project, it seems to me that developers 
> have been more biased toward trying to eliminate those constraints 
> rather then making the software work in spite of them.  The processor is 
> too slow - everybody immediately wants to overclock.  There is too 
> little memory - enter a few dozen schemes for compressing it or swaping it.
> 
> The XO platform has limitations, most of which were introduced by choice 
> for power or cost reasons.  The limitations are clearly documented and 
> were known by all, at least when the project started.  The understanding 
> was that the software would have to be adjusted to fit the hardware, not 
> the other way around.  Over time, we seem to have lost that understanding.
> 
> Software engineering is hard - software engineering for resource 
> restrained systems is even harder.  In this day and age geeks like us 
> have been accustomed to always having the latest and greatest hardware 
> at our fingertips, and so the software that we write is also for the 
> latest and greatest.  And so, when confronted with a system such as the 
> XO, our first instinct is to just plop our software on it and watch it 
> go.  That attitude is further re-enforced by the fact that the Geode is 
> x86 based - just like our desktops.  It should just work, right?  We 
> know better - or at least, we should know better.
> 
> The solution to the performance problems is good old fashioned elbow 
> grease.  We have to take our software that is naturally biased toward 
> the year 2007 and make it work for the year 1995.  Thats going to 
> involve fixing bugs in the drivers, but also re-thinking how the 
> software works - and finding situations where the software might be 
> inadvertently doing the wrong thing. Let me give you an example - as 
> recently as X 1.5, operations involving an a8 alpha mask worked like this:
> 
> * Draw a 1x1 rectangle in video memory containing the source color for 
> the operation
> * Read the source color from video memory
> * Perform the mask operation with the source color
> 
> This isn't smart for any kind of processor or GPU, running at 2 Ghz or 
> half a Ghz.  The X server knows the source color from the start, why 
> don't we just use it?  We get away with in it on a modern processor, but 
> it kills us on the Geode.  These are the sorts of things that we need to 
> find and squash - and yes, it will be very time consuming and a little 
> boring.  But if you care about performance, I mean really care about it 
> and not just out for the quick fix, these are the sorts of things that 
> we need to do.
> 
> Jordan
> 
> 



More information about the Devel mailing list