Geode instruction scheduling questions
NoiseEHC at freemail.hu
Thu May 31 15:27:25 EDT 2007
I have some question about the the scheduling of instructions since the
pdf is quite sparse.
I have also read this thread:
1. As I understand the Geode can only schedule 1 ops/clock even if it
has 2 execution units. Is it correct?
2. Does the IU (Integer Unit) perform integer MUL/DIV?
3. Is the IU pipelined? So the clock numbers are latencies or absolute
4. Is the FPU pipelined? So the clock numbers are latencies or absolute
5. There is the following text in page 656:
"The CPU is functionally divided into the Floating Point Unit (FPU) unit
and the Integer Unit. The FPU has been extended to
process MMX, AMD 3DNow!, and floating point instructions in parallel
with the Integer Unit.
When the Integer Unit detects an MMX instruction, the instruction is
passed to the FPU or execution. The Integer Unit continues
to execute instructions while the FPU executes the MMX instruction. If
another MMX instruction is encountered, the
second MMX instruction is placed in the MMX queue. Up to six MMX
instructions can be queued.
When the Integer Unit detects a floating point instruction without
memory operands, after two clock cycles the instruction
passes to the FPU for execution. The Integer Unit continues to execute
instructions while the FPU executes the floating
point instruction. If another FPU instruction is encountered, the second
FPU instruction is placed in the FPU queue. Up to
four FPU instructions can be queued. In the event of an FPU exception,
while other FPU instructions are queued, the state
of the CPU is saved to ensure recovery."
What is this 2 clock cycle stall? Is it the time while the op passes
though unused stages? What about 3DNow instructions?
6. PFRCP has a note 1 -> "1) These instructions must wait for the FPU
pipeline to flush. Cycle count depends on what instructions are in the
PFRCPV does not have this same note. Is it a bug? PFRCPIT1 has but
PFRSQRT does not have it. What is the reason? Or does it depend on
whether the op is implemented via microcode?
7. What do Way0, Way1, Way2, or Way3 mean on page 617?
"3) Any needed memory operands are in the cache in the last accessed way
(i.e., Way0, Way1, Way2, or Way3). Add two clocks if not in last
8. On page 617:
"8) For non-cached memory accesses, add several clocks. Cache miss
accesses are approximately an additional 25 clocks, the exact number
depends upon the cycle/operation running."
Does it mean that a cache miss stalls the execution unit for ~25 clocks?
Is it main RAM? If so how many clocks does it take reading from L1 and
how many from L2? In this case will it stall the load/store unit or can
the not stalled execution unit access memory?
9. Do you have some hard numbers of sequential/random 8byte read/write
speed on the OLPC machine? (So with the exact RAM and LX 800 processor
which is used in the OLPC machine.) MOVNTQ sequential speed?
10. If MASKMOVQ skips some bytes does it mean that it will not read
those skipped bytes from RAM? Or will it have the same speed as MOVNTQ?
11. It is referring to the FPU or just the FP ops (FMUL/FSIN)?
"This is correct. Two FP instructions cannot be issued on subsequent
"The Geode pipeline is very simple. We're not superscalar in any way,
shape or form." <- That is why I asked 4 and 5...
Note that I do not have an OLPC machine and will not ask for one before
I have some working code. But I need this info for running the inner
loops on the paper processor...
Thanks in advance!
More information about the Devel