MMU vs GPU (Was: The "iGoogle bug")

Fri Sep 21 07:31:11 EDT 2007

According to the databook, GP_BASE_OFFSET (page 270) is included in the 
command buffer (page 239). If you push the command buffer into the 
kernel then implementing that should be trivial. (Now I have realized 
that X runs in user mode and this amd driver is not a kernel driver but 
an X driver. What a stupid architecture...) Since GP_BASE_OFFSET defines 
a 16MB long buffer on a 4MB boundary, if the driver rejects bitmaps 
larger than 12MB then you do not even have to split at 16MB boundary 
(only at every 4KB). Of course, you have to split at page boundary 
anyway so you can just calculate GP_BASE_OFFSET every time...
However it is not clear if it would increase speed enough that it would 
worth implementing, after all one line of data will be held in the L1/L2 
cache so the video processor does not need to fetch memory. Keeping 
buffers in the X client's memory space would speed up things but I think 
that it would break some X semantics (for example if you push something 
to the X server, it would became a NOP but after that the client should 
not modify the bitmap). Now this would take a looong time to implement.
I wanted to look at X.org's implementation but their servers seems to be 
down.

Bernardo Innocenti wrote:
> NoiseEHC wrote:
>
>   
>>>  - Seeing if we can get the blitter to read source data directly from system
>>>    memory.  I'd be very surprised if there was no way to make it work
>>>    with virtual memory enabled, because, without such a mechanism, the
>>>    blitter would be less than fully useful.
>>>   
>>>       
>> Could somebody shed some light on this, please?
>>
>> I think that probably the Linux kernel has some page locking function 
>> which returns a list of physical addresses from a virtual address has 
>> not it?
>>     
>
> That's virt_to_phys(), yes... but it's not available in userspace.
> All the people I've consulted agreed it's not easy to translate
> virtual addresses from within a process.
>
>
>   
>> The Channel 3 DMA can be programmed to read from any 16MB block 
>> from the 32 bit address space. Why is it hard to combine the two?
>> Why is it even necessary to upload bitmaps to "video memory"?
>>     
>
> Yes... UMA systems already pay a price in terms of memory bandwidth,
> they should at least be compensated with the advantage of not having
> to do the migration crap.
>
> It's very likely a leftover from the original PC architecture with
> separate CGA/EGA/VGA cards.  Even now that GPUs are being integrated
> on the same physical die of the CPU, they still look and act like
> external PCI devices :-)
>
> DRM is supposed to help solve the virt_to_phys() problem.
> But if we could do as you suggest and just use bitmaps scattered
> through memory pages, we'd be *much* faster.
>
>