15 computer science collegians looking for a project

Tue Apr 29 12:20:31 EDT 2008

On 29/04/08 17:41 +0200, NoiseEHC wrote:
> On this page
> http://wiki.laptop.org/go/Geode_LX
> I have named some instructions as "Synchronized ops" (in the MMX 
> section). Are those real or did I mismeasured something?

That section is very difficult to understand.  I'm not sure which
operations you have invented this name for.

> If those are 
> real then would somebody from AMD just go through the databook and fix 
> the instruction clock cycle numbers? Because in that case it is sure 
> that they do not match reality and clearly I have better things to do 
> than measuring clock cycles. 

Clearly you must have some basis for assuming that the numbers are
wrong, so you must have done some measurement.  I consulted the
secret documentation that you claim I am withholding from you, 
and the timings there are the same as in the datasheet.  I believe that
you are correct in that these are the clock counts for the instruction to
go through the FPU and don't include the stall time for the pipeline
to clear up.

I am not a silicon designer, so I'm not the final word on if they are
correct or not, but at least that should prove that there isn't a
massive marketing conspiracy to hide the details of the processor
from our customers.  If they are lying to you, they are lying to me,
and they're not lying to me.

> Also the legend is clearly wrong in several 
> cases so probably that would need checking too (like on page 668 note 4 
> talks about 3DNOW ops in the table about FP ops).

That is an mistake - I have let the technical writer know about it.

> absolutely no info about L2 cache miss penalties or mispredicted jumps 
> or about the pipeline stages of the FP unit.

I don't have any information about L2 cache miss penalties, but they 
are easy to calculate. Please see:

http://homepages.cwi.nl/~manegold/Calibrator/

I will talk to somebody about documenting the FP unit pipeline.
It does handle 1 instruction per clock from the integer unit.
In practice we know that two floating point instructions back to
back will stall the IU.  I can also tell you that it is optimized
for single precision, so double precision is handled by microcode
and needs to go through the path again. 

> See, all I would like to have is enough data that when I look at 
> assembly code I could approximately calculate how many clock cycles will 
> be consumed. Nothing more and nothing less.

You have nearly all the information you need, and you can collect the
additional information the same way we do, with careful analysis and
measurement.  In fact, Bernie and Vladimir Makarov have done a lot
of work already in this area, resulting in the Geode specific
code for gcc 4.2.0 and glibc.  Perhaps you can work with them to figure
out the finer details of the FPU scheduling.  I'm sure they would
appreciate it.

Jordan