15 computer science collegians looking for a project
NoiseEHC
NoiseEHC at freemail.hu
Wed Apr 30 04:18:24 EDT 2008
> On 29/04/08 17:41 +0200, NoiseEHC wrote:
>
>> On this page
>> http://wiki.laptop.org/go/Geode_LX
>> I have named some instructions as "Synchronized ops" (in the MMX
>> section). Are those real or did I mismeasured something?
>>
>
> That section is very difficult to understand. I'm not sure which
> operations you have invented this name for.
>
As you probably have already noticed I am not a native English speaker
(and neither learned advanced English in school, just picked it up).
What I wanted to write in that section, every MMX op, whose
source/destination operand is an integer register (and not a MOV), will
consume absolutely different clock cycles than 2 (2 is listed for almost
every MMX op in the databook, at least in my version). Is it real?
>> If those are
>> real then would somebody from AMD just go through the databook and fix
>> the instruction clock cycle numbers? Because in that case it is sure
>> that they do not match reality and clearly I have better things to do
>> than measuring clock cycles.
>>
>
> Clearly you must have some basis for assuming that the numbers are
> wrong, so you must have done some measurement. I consulted the
> secret documentation that you claim I am withholding from you,
> and the timings there are the same as in the datasheet. I believe that
> you are correct in that these are the clock counts for the instruction to
> go through the FPU and don't include the stall time for the pipeline
> to clear up.
>
There is a "Test results" section in that page. The first two test were
conducted via email. I have emailed to this list test programs and there
were people who run them and emailed back the result. Especially the
first test has some stupid bugs because I wrote them essentially blind.
The third one is the result of my session logged into a physical
machine. It can be that only this "stall time" is missing from the
databook but the fact is that I as a programmer am not interested in how
many clock cycles does the FPU take to execute some internal operation
(which seems the databook to list) but I would like to know the real
time consumed.
> I am not a silicon designer, so I'm not the final word on if they are
> correct or not, but at least that should prove that there isn't a
> massive marketing conspiracy to hide the details of the processor
> from our customers. If they are lying to you, they are lying to me,
> and they're not lying to me.
>
>
This conspiracy thing was not serious, I have used a smiley at the end.
However from my perspective there is no difference if there is some
conspiracy or if there is not. In fact what I think is either that I am
mistaken and made some errors measuring this or the technical writer
made mistakes years ago and nobody cared to fix it.
>
>> Also the legend is clearly wrong in several
>> cases so probably that would need checking too (like on page 668 note 4
>> talks about 3DNOW ops in the table about FP ops).
>>
>
> That is an mistake - I have let the technical writer know about it.
>
Thanks!
Another error:
On page 631 it talks about this:
Conditional jump taken | Conditional jump not taken. (e.g., "4|1" = four
clocks if jump taken, one clock if jump not taken).
It is never used in the opcode table.
>
>> absolutely no info about L2 cache miss penalties or mispredicted jumps
>> or about the pipeline stages of the FP unit.
>>
>
> I don't have any information about L2 cache miss penalties, but they
> are easy to calculate. Please see:
>
> http://homepages.cwi.nl/~manegold/Calibrator/
>
Could you run on your machine and share the results? Currently I do not
have access to an XO.
> I will talk to somebody about documenting the FP unit pipeline.
> It does handle 1 instruction per clock from the integer unit.
> In practice we know that two floating point instructions back to
> back will stall the IU. I can also tell you that it is optimized
> for single precision, so double precision is handled by microcode
> and needs to go through the path again.
>
>
Thanks!
I would also like to know how many ALU units does the FPU have? I mean
FMUL costs 1, PFMUL costs 2. Is it because it only has 1 multiply unit
and it executes PFMUL serially? If that is the case, does that mean that
the 3DNOW support is only compatibility and will not be faster than
simple FP?
>
>> See, all I would like to have is enough data that when I look at
>> assembly code I could approximately calculate how many clock cycles will
>> be consumed. Nothing more and nothing less.
>>
>
> You have nearly all the information you need, and you can collect the
> additional information the same way we do, with careful analysis and
> measurement. In fact, Bernie and Vladimir Makarov have done a lot
> of work already in this area, resulting in the Geode specific
> code for gcc 4.2.0 and glibc. Perhaps you can work with them to figure
> out the finer details of the FPU scheduling. I'm sure they would
> appreciate it.
>
> Jordan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.laptop.org/pipermail/devel/attachments/20080430/1bb80c69/attachment.html>
More information about the Devel
mailing list