XO-1.5 graphics clock control causes instability

Daniel Drake dsd at laptop.org
Mon Sep 24 10:20:53 EDT 2012


Hi,

A couple of months ago, we ran an XO-1.5 suspend/resume test outside
of X and found that it would frequently fail to resume. Upon resume,
the system would spontaneously reboot. No kernel messages were printed
during the resume process over serial, only "+X" or "+r+X" which is
OpenFirmware saying "memory contents are bad, I can't figure out how
to resume"

This was filed as #12039.

Wad suggested that the most likely cause is memory rot, probably
caused by some device accessing main memory (e.g. with DMA) while the
CPU was asleep. This will bring the memory out of self-refresh and
since the CPU is asleep, the memory contents would then be lost.

I've investigated this and I have a workaround (accepted upstream,
pushed for next build). Since its an odd issue and likely to bite
again at some point, here are the details:

viafb recently started tweaking the state of the primary (IGA1) and
secondary (IGA2) clocks and PLLs (e.g. in commit b692a63af8b6).
It is the tweaking of the clock state that causes instability. The
clocks in question are configured by IO Port 3C5.1B ("Power Management
Control 1"), bits 5:4 (IGA1) and 7:6 (IGA2).

These clocks can be configured in 3 ways:
 1. Always on
 2. Always off
 3. Clock auto on/off according to power management status (default)
(it's not clear what the definition of "power management status" is)

That suggests there are 2 actual states for these clocks - on or off.
However, my findings suggest that there must be a 3rd state at play,
since:

IGA2 clock on = unstable
IGA2 clock off = no display
IGA2 clock auto = stable

For completeness, IGA1 behaviour:

IGA1 clock on = unstable
IGA1 clock off = stable
IGA1 clock auto = stable

("unstable" means that the system will fail with the +X condition in a
suspend/resume loop after a fairly short number of cycles, "stable"
means the system survived the same loop overnight)

viafb now avoids touching these clocks on XO-1.5 (pushed to x86-3.3 as
3ef9f18dfe) and things are stable.

Daniel



More information about the Devel mailing list