Geode instruction scheduling questions

Thu May 31 15:27:25 EDT 2007

Hi!

I have some question about the the scheduling of instructions since the 
pdf is quite sparse.
I have also read this thread:
http://lists.laptop.org/pipermail/devel/2006-August/001232.html

1. As I understand the Geode can only schedule 1 ops/clock even if it 
has 2 execution units. Is it correct?

2. Does the IU (Integer Unit) perform integer MUL/DIV?

3. Is the IU pipelined? So the clock numbers are latencies or absolute 
times?

4. Is the FPU pipelined? So the clock numbers are latencies or absolute 
times?

5. There is the following text in page 656:
"The CPU is functionally divided into the Floating Point Unit (FPU) unit 
and the Integer Unit. The FPU has been extended to
process MMX, AMD 3DNow!, and floating point instructions in parallel 
with the Integer Unit.
When the Integer Unit detects an MMX instruction, the instruction is 
passed to the FPU or execution. The Integer Unit continues
to execute instructions while the FPU executes the MMX instruction. If 
another MMX instruction is encountered, the
second MMX instruction is placed in the MMX queue. Up to six MMX 
instructions can be queued.
When the Integer Unit detects a floating point instruction without 
memory operands, after two clock cycles the instruction
passes to the FPU for execution. The Integer Unit continues to execute 
instructions while the FPU executes the floating
point instruction. If another FPU instruction is encountered, the second 
FPU instruction is placed in the FPU queue. Up to
four FPU instructions can be queued. In the event of an FPU exception, 
while other FPU instructions are queued, the state
of the CPU is saved to ensure recovery."

What is this 2 clock cycle stall? Is it the time while the op passes 
though unused stages? What about 3DNow instructions?

6. PFRCP has a note 1 -> "1) These instructions must wait for the FPU 
pipeline to flush. Cycle count depends on what instructions are in the 
pipeline."
PFRCPV does not have this same note. Is it a bug? PFRCPIT1 has but 
PFRSQRT does not have it. What is the reason? Or does it depend on 
whether the op is implemented via microcode?

7. What do Way0, Way1, Way2, or Way3 mean on page 617?
"3) Any needed memory operands are in the cache in the last accessed way 
(i.e., Way0, Way1, Way2, or Way3). Add two clocks if not in last 
accessed way."

8. On page 617:
"8) For non-cached memory accesses, add several clocks. Cache miss 
accesses are approximately an additional 25 clocks, the exact number 
depends upon the cycle/operation running."
Does it mean that a cache miss stalls the execution unit for ~25 clocks? 
Is it main RAM? If so how many clocks does it take reading from L1 and 
how many from L2? In this case will it stall the load/store unit or can 
the not stalled execution unit access memory?

9. Do you have some hard numbers of sequential/random 8byte read/write 
speed on the OLPC machine? (So with the exact RAM and LX 800 processor 
which is used in the OLPC machine.) MOVNTQ sequential speed?

10. If MASKMOVQ skips some bytes does it mean that it will not read 
those skipped bytes from RAM? Or will it have the same speed as MOVNTQ?

11. It is referring to the FPU or just the FP ops (FMUL/FSIN)?
http://mailman.laptop.org/pipermail/devel/2006-August/001323.html
"This is correct. Two FP instructions cannot be issued on subsequent 
cycles."

"The Geode pipeline is very simple. We're not superscalar in any way, 
shape or form." <- That is why I asked 4 and 5...

Note that I do not have an OLPC machine and will not ask for one before 
I have some working code. But I need this info for running the inner 
loops on the paper processor...

Thanks in advance!