15 computer science collegians looking for a project

NoiseEHC NoiseEHC at freemail.hu
Wed Apr 30 04:18:24 EDT 2008


> On 29/04/08 17:41 +0200, NoiseEHC wrote:
>   
>> On this page
>> http://wiki.laptop.org/go/Geode_LX
>> I have named some instructions as "Synchronized ops" (in the MMX 
>> section). Are those real or did I mismeasured something?
>>     
>
> That section is very difficult to understand.  I'm not sure which
> operations you have invented this name for.
>   
As you probably have already noticed I am not a native English speaker 
(and neither learned advanced English in school, just picked it up). 
What I wanted to write in that section, every MMX op, whose 
source/destination operand is an integer register (and not a MOV), will 
consume absolutely different clock cycles than 2 (2 is listed for almost 
every MMX op in the databook, at least in my version). Is it real?
>> If those are 
>> real then would somebody from AMD just go through the databook and fix 
>> the instruction clock cycle numbers? Because in that case it is sure 
>> that they do not match reality and clearly I have better things to do 
>> than measuring clock cycles. 
>>     
>
> Clearly you must have some basis for assuming that the numbers are
> wrong, so you must have done some measurement.  I consulted the
> secret documentation that you claim I am withholding from you, 
> and the timings there are the same as in the datasheet.  I believe that
> you are correct in that these are the clock counts for the instruction to
> go through the FPU and don't include the stall time for the pipeline
> to clear up.
>   
There is a "Test results" section in that page. The first two test were 
conducted via email. I have emailed to this list test programs and there 
were people who run them and emailed back the result. Especially the 
first test has some stupid bugs because I wrote them essentially blind. 
The third one is the result of my session logged into a physical 
machine. It can be that only this "stall time" is missing from the 
databook but the fact is that I as a programmer am not interested in how 
many clock cycles does the FPU take to execute some internal operation 
(which seems the databook to list) but I would like to know the real 
time consumed.

> I am not a silicon designer, so I'm not the final word on if they are
> correct or not, but at least that should prove that there isn't a
> massive marketing conspiracy to hide the details of the processor
> from our customers.  If they are lying to you, they are lying to me,
> and they're not lying to me.
>
>   
This conspiracy thing was not serious, I have used a smiley at the end. 
However from my perspective there is no difference if there is some 
conspiracy or if there is not. In fact what I think is either that I am 
mistaken and made some errors measuring this or the technical writer 
made mistakes years ago and nobody cared to fix it.
>   
>> Also the legend is clearly wrong in several 
>> cases so probably that would need checking too (like on page 668 note 4 
>> talks about 3DNOW ops in the table about FP ops).
>>     
>
> That is an mistake - I have let the technical writer know about it.
>   
Thanks!
Another error:
On page 631 it talks about this:
Conditional jump taken | Conditional jump not taken. (e.g., "4|1" = four 
clocks if jump taken, one clock if jump not taken).
It is never used in the opcode table.
>   
>> absolutely no info about L2 cache miss penalties or mispredicted jumps 
>> or about the pipeline stages of the FP unit.
>>     
>
> I don't have any information about L2 cache miss penalties, but they 
> are easy to calculate. Please see:
>
> http://homepages.cwi.nl/~manegold/Calibrator/
>   
Could you run on your machine and share the results? Currently I do not 
have access to an XO.
> I will talk to somebody about documenting the FP unit pipeline.
> It does handle 1 instruction per clock from the integer unit.
> In practice we know that two floating point instructions back to
> back will stall the IU.  I can also tell you that it is optimized
> for single precision, so double precision is handled by microcode
> and needs to go through the path again. 
>
>   
Thanks!
I would also like to know how many ALU units does the FPU have? I mean 
FMUL costs 1, PFMUL costs 2. Is it because it only has 1 multiply unit 
and it executes PFMUL serially? If that is the case, does that mean that 
the 3DNOW support is only compatibility and will not be faster than 
simple FP?
>   
>> See, all I would like to have is enough data that when I look at 
>> assembly code I could approximately calculate how many clock cycles will 
>> be consumed. Nothing more and nothing less.
>>     
>
> You have nearly all the information you need, and you can collect the
> additional information the same way we do, with careful analysis and
> measurement.  In fact, Bernie and Vladimir Makarov have done a lot
> of work already in this area, resulting in the Geode specific
> code for gcc 4.2.0 and glibc.  Perhaps you can work with them to figure
> out the finer details of the FPU scheduling.  I'm sure they would
> appreciate it.
>
> Jordan
>
>   


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.laptop.org/pipermail/devel/attachments/20080430/1bb80c69/attachment.html>


More information about the Devel mailing list