[OLPC-devel] Thoughts on S3 resume.

Sat Aug 5 17:03:23 EDT 2006

On Sat, 2006-08-05 at 10:12 -0700, Mark J. Foster wrote:
> Hi, Folks!
> 
> Fast resume from S3 isn't at all a trivial issue.  The following is a
> brainstorming report that Jim and I wrote up back in April to help
> stir the pot.  In essence, the challenge is how to reinitialize all of
> the system devices which have been powered-down as quickly as
> possible.  

Linux does this; it just does reinitialization serially.  Thankfully, it
does not use ACPI for its reinitialization, but uses code in each
driver.  Even serially, Linux on the iPAQ managed to get running in
10ms.

But there are some notorious device drivers out there, written to talk
to some broken hardware.  Whether we face these on our hardware is
probably best determined experimentally, than reading the code.  After
all, we're going to have to measure it anyway ;-).

> In an effort to push the envelope, we've hypothesized one possible
> approach that could maximize resume speed.  Note that actually
> implementing this would be anything other than trivial due to the fact
> that it touches "everything", and even more important, we don't even
> know if it is necessary!  It is quite possible that the current resume
> infrastructure is fast enough not to need all of this work.
> Nonetheless, what this white paper does do is to help inform folks of
> some of the challenges associated with trying to enter S3 after
> perhaps one second of system inactivity...i.e. orders of magnitude
> faster than current systems attempt to do.

What we don't know, and I want numbers as soon as possible, is whether
we have drivers with problems on our hardware and/or drivers, and how
long it takes our hardware to start executing Linux code.  

On the iPAQ, for instance, the hardware seems to take between 280 and
400ms doing absolutely nothing before code starts executing (more
precisely, from power on until the point the bootloader transfers
control to Linux.

So there are several outcomes possible:
  1) LinuxBIOS/Linux resumes fast enough  as on the iPAQ (and our
hardware is not as slow as the iPAQ) - no work needed
  2) a single driver dominates resume time, waiting for some piece of
braindead hardware (presuming it isn't an out and out bug in the driver,
or the driver is working around some variant of the hardware that is
particularly broken; this is not uncommon on PC hardware). In this case,
we'll probably have to go do work on that driver, maybe using lazy
evaluation tricks to hide the resume time elsewhere.
  3) we find we have multiple drivers that take significant, but not
horrible lengths of time that we'd like to overlap in parallel.  Then
Mark's idea is absolutely stupendous.  I'd love to see someone implement
the idea anyway, as it would be dynamite on servers where you are
reinitializing tons of hardware at once (10-1000 disk drives, for
example).

With a bit of luck, I'll take the measurements this next week that will
begin to tell us what is happening on our hardware.

In any case, Mark's fast resume idea is a good one, even if we find we
don't need to use it for OLPC.
                                              - Jim

> 
> Please accept my apologies for the HTML content.
> 
> Cheers!
> MarkF
> 
> > Rapid S3: A Brainstorming Proposal
> > 
> > Introduction:
> > 
> > The One Laptop Per Child (OLPC) initiative must utilize very
> > aggressive power management solutions in order to deliver the most
> > cost-effective solution for our "$100 laptop."  A significant
> > fraction of school age children lack power at home, most lack power
> > in their schools and virtually none have it at their desk.
> > Moreover, when power is available, it is often unreliable.
> > Human-powered machines which do not require AC line power are
> > therefore a necessity, rather than a luxury.  To this end, our
> > laptop must be highly power-efficient, with an average power
> > consumption target of less than 2 watts, versus 10-30 watts in
> > traditional machines.  Power management is key.
> > 
> > For various reasons, our first-generation power management
> > architecture will focus on extremely fast entry into, and exit from,
> > "Suspend To RAM" (known as "S3") mode - this is made possible via an
> > intelligent display controller that can continue to refresh the
> > display while the CPU is shut down.  Additionally, the chosen
> > wireless chip can continue to forward packets autonomously during
> > suspend, with the capability of waking the primary processor when
> > required.  Further, since the system is Flash-based, there are no
> > disk or disk controller latencies to slow the process.
> > 
> > Using S3 is certainly not unusual, but what is unique is our goal of
> > utilizing S3 with sub-second timeframes.  While good profiling data
> > is currently lacking, a number of embedded Linux systems (e.g. the
> > HP iPAQ and Nokia 770), can suspend on the order of a second, and
> > resume in a fraction of a second. The current asymmetry in these
> > device's behavior is not currently understood. Investigation has
> > just begun. 
> > 
> > Specifically, we'll actually be entering and exiting from S3
> > in-between keystrokes whenever there is a significant pause, similar
> > to the use of "stop-clock" mode on conventional notebooks.  This is
> > a challenging goal, yet one which we believe is attainable.
> > Accomplishing this task will, however, require the cooperation of
> > many members of the Open Source community.  In order to help define
> > the best solution, this document is intended to solicit feedback
> > from those folks who are most knowledgeable about the relevant code
> > subsystems, including the Linux kernel and device drivers, the Linux
> > ACPI interface, the X Window System system architecture and drivers,
> > and the LinuxBIOS.
> > 
> > 
> > First steps:
> > 
> > Our first task will be to profile X Window System activity in order
> > to better quantify user behavior - this will help to determine the
> > proper delay before initiating suspend requests.  In addition, we'll
> > need to instrument the suspend/resume cycle and understand the
> > current asymmetry between suspend and resume on the systems that
> > currently complete the suspend/resume cycle most quickly. We do not
> > understand why, on the systems we've examined, suspend is so slow.
> > Our current best theory is that people care more about wakeup than
> > suspend, and that there is just "something dumb" going on during
> > suspend. Anyone with suitable instrumentation to create such profile
> > information (particularly for an X86 implementation) is encouraged
> > to make the information available to the community.  
> >  
> > 
> > Facts vs. Guesses:
> > 
> > In many ways, this document is admittedly premature, since the OLPC
> > team does not (yet) have good measurement statistics for the time
> > durations that each of the relevant code subsystems require to enter
> > and exit S3.  Casual observation would suggest that most S3
> > implementations greatly exceed the necessary timeframes, but
> > detailed profile information for these transitions would be
> > enormously valuable, and would be greatly appreciated!  
> > 
> > In the absence of suitable profiling data, the remainder of this
> > proposal makes the (quite possibly inappropriate) assumption that
> > existing code and interfaces may not be suitable to achieve the
> > performance that's necessary for the OLPC machine.  We therefore
> > recklessly propose some possible implementations, not because
> > they're optimal (or even rational), but simply to serve as a
> > convenient means of documenting the performance characteristics that
> > we seek, and to serve as a "brainstorming seed" for those who know
> > better!  Throughout, a list of specific code areas that are believed
> > to be potentially problematic are highlighted.  Feedback on any or
> > all of these issues will be very much appreciated by the OLPC team!
> > 
> > 
> > Band-Aids and Duct Tape:
> > 
> > Sadly, even though the OLPC laptop is a non-profit project, we're
> > still burdened with the egregious spectre of a tight schedule.
> > Despite our best wishes, we can't realistically aim for the ultimate
> > in structural enhancements, such as system-wide device driver
> > structures that provide parsable, completely generic device
> > subsystem interrelationship information, nor do we have the luxury
> > of significant changes to the system's existing hardware
> > architecture.  Consequently, we, like the rest of the world, seek a
> > happy (or at least not entirely depressing) medium.
> > 
> > As one possibility, what about lying?  To what extent might the
> > inherent structural challenges be ameliorated by claiming that we're
> > entering S1 (stop-clock) state, instead of S3 (suspend) state, while
> > playing a few device driver games?  What hidden demons might lurk
> > within such a disreputable approach?  Alternately, might it be best
> > to just live in a fantasy/dream state, where we mystically maintain
> > and restore device state information in such a manner as to tell
> > ACPI little to nothing (i.e. far less than it expects)?
> > 
> > 
> > The Goal:
> > 
> > Our target response to an input event (keyboard or mouse) is 200 mS,
> > for that results in a minimally perceivable response delay.  CHI
> > research over the years shows that humans are less sensitive to
> > keyboard delays than other delays (e.g. rubber banding or dragging
> > operations in user interfaces). Due to the nature of the AMD GX2 CPU
> > platform at the core of the machine, 150 mS of this time are already
> > required for the CPU's reset delay, so only 50 mS are available for
> > code execution to satisfy the 200 mS target.  At this latency,
> > research suggests the delay will be perceptible, but not highly
> > objectionable.  Therefore, the specific goal for exiting S3 is 50
> > mS.  This reflects the total amount of elapsed clock time from the
> > CPU fetching the reset vector until the kernel hands off control to
> > a user-space application.  In the future, on hardware platforms that
> > are explicitly optimized for fast S3, significantly lower targets
> > would be desirable.
> > 
> > Entry into S3 is also a challenge, given fundamental requirements
> > such as activity detection.  Fortunately, the hardware component of
> > entry into S3 should only be on the order of 10 mS.  In the absence
> > of hard data, we'll arbitrarily pick a goal for entry into S3 of 100
> > mS.  This would yield an effective clock time of 90 mS available for
> > activity detection timeout, saving the state of the machine into
> > RAM, and execution of the S3 transition code. In the future, we'd
> > like to be able to make such a transition in as little as 20
> > milliseconds, at which point the delays will usually be under human
> > perception. So for our initial goal, we're shooting for five times
> > the ideal.
> > 
> > 
> > Secret Sauce:
> > 
> > It's important to note a couple of important factors that
> > significantly assist in meeting these performance goals.  First, the
> > OLPC machine utilizes a NAND Flash-based JFFS2 filesystem, so
> > conventional IDE drives are not an issue.  Second, as suggested
> > previously, it is not necessary to save or restore the state of the
> > display.  A custom ASIC provides continuous screen refresh whether
> > or not the CPU is powered-on, even though the OLPC machine's GX2
> > processor utilizes a Unified Memory Architecture for the frame
> > buffer.  Therefore, low-level video code will not be a significant
> > factor (further details regarding this ASIC are beyond the scope of
> > this document).
> > In addition, the Marvell wireless chip allows us to have wireless alive and able to wake the processor when packets are addressed to it (while continuing to forward packets in the mesh autonomously).
> > 
> > Problems:
> > 
> > Today, many drivers utilize somewhat arbitrary hardware wake-up
> > delays, sometimes for good reasons, and sometimes for bad reasons.
> > Fortunately, our machine lacks some of the "features" found on most
> > laptops that aggravate the problem (e.g. IDE controllers and disk
> > devices).  We will obviously tune the delays in the drivers for the
> > specific instance of hardware we have, and as we plan to use
> > LinuxBIOS, we are free to work at all layers of the stack.
> > 
> > Presuming that we cannot meet the above goals "out of the box", we
> > have several alternatives:
> > 
> > 
> > Solution 1:
> > -----------
> > 
> > Whenever a significant delay in the driver is reached, modify the
> > power management system to enable working on the state transitions
> > in the drivers on other hardware.  Obviously, this will have to
> > respect bus and device topology in the usual fashion. With luck,
> > this *might* get us to our initial goal; we wonder, however, if it
> > can get to our longer term ideal.
> > 
> > 
> > Straightforward... but?:
> > 
> > The most obvious implementation method would be to use the existing
> > ACPI structure to implement the OLPC power architecture.  However,
> > we are concerned, based solely on user-level observation of existing
> > implementations, that this may be insufficient for the task.  At the
> > macro level, we observe seconds-level delays during most transitions
> > into S3.  Potentially problematic components of this delay may
> > include:
> >      1. Activity detection.  It may well be that current
> >         implementations wait for a substantial window of time for
> >         process execution to complete.  If so, a more efficient
> >         means of activity detection is necessary.
> >      2. Serial execution.  The current structure would seem to
> >         require serial execution of the suspend code in each of the
> >         system's device drivers, even though it is theoretically
> >         possible to parallelize this execution in most cases
> >         (obvious exceptions to parallelization include drivers for
> >         bus bridges, etc).
> >      3. Conservative device drivers.  In order to maximize
> >         portability, it is quite common to see delays on the order
> >         of hundreds of milliseconds in device driver code to ensure
> >         device inactivity.  While this is quite reasonable given the
> >         current usage of S3, it will be necessary to conform much
> >         more closely to data sheet minimums on the OLPC machine to
> >         deliver the necessary performance targets.
> >      4. Sequential delays.  Due to items (2) and (3), substantial
> >         time appears to be wasted on delays that might otherwise be
> >         concurrent.  For example, if there are 10 devices in the
> >         system, each of which require delays of 20mS, the current
> >         structure would seem to mandate 200 mS of clock time for
> >         concatenated delays, when a single 20 mS parallel delay
> >         should be sufficient.
> >      5. Unnecessarily complex state capture.  Hardware isn't always
> >         friendly, and many drivers have to wind up doing strange or
> >         unusual tricks to record the current state of a device.
> >      6. Assumptions of PC compatibility requirements - closely
> >         related to item (6).  Thank goodness, much less of this is
> >         seen on Linux than with other platforms, but device drivers
> >         which originate on the PC platform often inherit the habit
> >         of recovering the current device state from hardware, even
> >         though it would be far faster to maintain state in the
> >         device driver itself.
> >      7. Unnecessary state saves.  As before, there is no need to
> >         save the frame buffer contents of the OLPC machine.  In a
> >         similar vein, it is unnecessary to save the state of the
> >         OLPC's wireless network, for it stays alive (thanks to an
> >         on-chip CPU and embedded RAM) even if the primary CPU is
> >         powered off.
> > On highly portable machines which have received special
> > optimization, exit from S3 is considerably faster than entering S3,
> > but would still appear to be considerably slower than is needed for
> > the OLPC machine.  Components of this delay might include:
> >      1. Assumed "power stabilization" delays.  There are, in fact,
> >         many machines which do require hard delays before they are
> >         fully functional.  On the OLPC machine, no such delay will
> >         be necessary.  Once execution begins, full speed ahead!
> >      2. Serial execution.  As with suspend requests, it would appear
> >         that sequential driver execution is mandated during system
> >         resume, despite the fact that most drivers could be executed
> >         concurrently, if sufficient information were somehow made
> >         available regarding the structure of, and interaction
> >         between, device subsystems.
> >      3. Conservative device drivers.  During resumes / exit from S3,
> >         device driver delays are even more common, and more
> >         prolonged.  Again, this is noteworthy in pursuit of maximum
> >         portability, but is problematic when high-performance is
> >         required.
> >      4. Sequential delays.  Ditto.  Initialization / reset delays
> >         are an even greater concern than suspend delays.
> >      5. Unnecessary delays.  In cases such as IDE initialization,
> >         it's very common to see extremely long delays for device
> >         spin-up.  In a machine with no rotating drives, this is
> >         clearly irrelevant.
> >      6. Polling for nonexistent devices.  As a restricted machine
> >         with predefined peripherals, time spent searching for
> >         peripherals that "might exist" on more expandable machines
> >         is an expensive waste.  The only exception is for the OLPC
> >         machine's three USB 2.0 ports, which will undoubtedly
> >         require special treatment.
> >      7. Unnecessary state restores.  Again, it is unnecessary to
> >         restore the state of the display or wireless subsystems.
> >         Today, tens or hundreds of milliseconds are often spent
> >         repainting the screen after a resume, even though this is
> >         unnecessary (and very undesirable) on the OLPC machine.
> >         Handling applications which perform "frivolous" writes (such
> >         as on-screen clocks or even worse, animated GIFs) will
> >         require a bit more care.  Handling wireless resume in a
> >         typical fashion would be a gargantuan mistake, since
> >         repeating a DHCP request, and reinitializing the TCP/IP
> >         stack once or twice each second would be quite problematic,
> >         to say the least!
> > 
> > Solution 2:
> > -----------
> > 
> > To help provide a more concrete basis for discussion and flames,
> > we'll next discuss one possible technique for achieving very fast S3
> > transitions.  To do this, we'll hypothesize a new core "Fast S3
> > Management" (FS3M) driver.  Located either within the system's BIOS,
> > or as a standard Linux device driver, FS3M's primary goals include:
> >       * Serving as a central repository for a machine's current
> >         state  - including the state of all devices.
> >       * Providing an efficient interface which allows device drivers
> >         to "register" the current device state whenever a state
> >         change occurs.
> >       * Delivering an efficient means of rapidly restoring a machine
> >         to a previously recorded state.
> >       * Creating a lightweight method for enabling the
> >         parallelization of device initialization / state restoration
> >         in a straightforward, easy to control manner.
> >       * Utilizing generic data structures that compactly encode
> >         device and machine state.
> > FS3M as envisioned here would contain a sorted array of
> > "initialization lists" (or "init lists").  Conceptually, an init
> > list is simply a list of memory or I/O ports to be written, which
> > also contains the data to be written to each port.  A single init
> > list may contain all of a device's state, or only a portion of a
> > device's state, however it's easiest to think of one init list as
> > containing the state of a single device, and as therefore providing
> > all of the information that's necessary to restore the state of that
> > device.
> > 
> > Throughout this document, the terms "restore" and "initialize" are
> > essentially used interchangeably.  This is intentional.  A properly
> > defined init list should be capable of initializing a device to a
> > given state at any point in time.  At power-up, we can think of this
> > process as initialization, but since the initialization is to a
> > previously recorded state, it is identical to "restoring" that
> > previous state.
> > 
> > In the real world, device initialization often requires operations
> > that are more complex than basic I/O or memory writes.  It is
> > therefore important to be able to encode a richer set of information
> > that better satisfies the needs of most devices.  Rather than
> > abandon the high-level picture to elaborate at this point, Appendix
> > A contains a reasonably detailed proposal which identifies the
> > broader set of capabilities that FS3M's init lists should support.
> > For now, just remember that init lists are simply lists that encode
> > device state.
> > 
> > As a machine contains multiple devices, a machine's state may
> > therefore be encoded via multiple init lists.  A key problem,
> > however, is identifying a means of organizing multiple init lists in
> > a manner which identifies the actual dependencies of one device upon
> > another: memory controllers must be initialized before memory may be
> > written, bus bridges must be initialized before devices that are
> > located on that bus can be initialized, etc.  To restate the issue,
> > as long as the collection of init lists are executed in the proper
> > sequence, it will be possible to deterministically initialize a
> > machine to a specific known state - i.e. to restore it to that
> > state.
> > 
> > Due to the enormous variety of system architectures, it is not
> > generally possible to automatically determine the proper location of
> > a device within a machine's device heirarchy.  Therefore, FS3M
> > relies on the programmer to provide simple ID numbers that specify
> > how a specific init list relates to other init lists.  Further, this
> > ID information is used to identify when multiple devices are at the
> > same point in the device heirarchy.  This specific relationship is
> > quite important in FS3M, for devices with the same position in the
> > heirarchy may be initialized in parallel, which can dramatically
> > accelerate the process of restoring a machine's state.
> > 
> > First, before proceeding, it's important to emphasize that, while a
> > conventional hardware device heirarchy and the init list heirarchy
> > are similar, they are not identical.  FS3M's heirarchy is solely
> > determined by the potential for simultaneous initialization of
> > different init lists -- it is the programmer's responsibility to
> > ensure that init lists which cannot be executed in parallel use IDs
> > that prevent them from being so executed.
> > 
> > FS3M's heirarchy is extremely simple.  Each init list has two ID
> > values: a "major ID" and a "minor ID".  Those init lists with
> > identical major IDs are considered to be at the same point in the
> > heirarchy, and the minor ID value simply differentiates between
> > lists for administrative purposes (such as knowing when a new init
> > list/state should replace an existing init list/state).  To
> > illustrate how this heirarchy may differ from a normal hardware
> > heirarchy, devices at completely different levels within the
> > hardware heirarchy may share the same major ID: for instance, a PCI
> > bridge and an on-chip video controller might share the same
> > heirarchy level as an init list which encodes the state of a status
> > display.
> > 
> > This structure leads to a simple conclusion: the most important part
> > of using init lists is the assignment of ID numbers.  At one end of
> > the spectrum, assigning every init list with the same major ID will
> > lead to a complete mess, as devices which interrelate are
> > initialized at the same time as the bus bridges which connect them
> > to the CPU.  At the other extreme, assigning a different major ID to
> > each init list will prevent any opportunity for parallelization,
> > leading to unnecessarily slow initialization.  You might wish to
> > start this way for simplicity during initial debugging, but this
> > should be optimized once the initialization values are known to be
> > correct.  In the authors' opinion, the easiest method of determining
> > the proper heirarchy level is to visually graph the different init
> > lists in the system, and to rearrange them until a reasonably
> > efficient heirarchy is determined.
> > 
> > A further level of optimization can also be quite beneficial when
> > high performance is required: splitting init lists.  Consider the
> > case where two devices are located at the same point in the
> > heirarchy, but one device requires only 50 uS to initialize, while
> > the second device requires a 10mS delay during its initialization
> > sequence.  Since FS3M cannot proceed to the next level in the
> > heirarchy until all devices in the current level have been
> > initialized, this case would result in an unnecessary ~9.95 mS
> > delay.  To resolve this, it is relatively straightforward to split
> > the init list that requires the long delay into two components: one
> > list that performs the first section of the initialization prior to
> > the delay, and a second list that performs the remainder of the
> > initialization at a different heirarchy level.  However, it is
> > important to note a significant caveat when splitting init lists.
> > As currently envisioned, FS3M will not include provisions for
> > relating init lists with different heirarchy levels by time.  In
> > other words, when removing a long delay from an init list and
> > splitting the list at the delay point, it is the programmer's
> > responsibility to ensure that execution of intermediate init lists
> > at lower heirarchy levels will inherently provide sufficient delay
> > until the second component of the device's init list is executed.
> > If not thought through, this could make maintenance more
> > challenging.  Once again, careful assignment of IDs is key to
> > delivering the highest performance from init lists.
> > 
> > 
> > FS3M In Operation:
> > 
> > If FS3M were completely integrated into a system, it would be
> > incorporated into the system's boot ROM/BIOS, and that ROM would
> > contain hard-coded default init lists that bring up the machine to
> > its power-up state.  Some of these init lists might be quite short -
> > even two-lines long - these would typically be lists that execute
> > suitable native language machine diagnostics at the appropriate
> > points in time.  Just before executing the system's boot loader, the
> > ROM would inform FS3M (via delete_init_list(id) calls) to remove the
> > diagnostics-related init lists, thus leaving FS3M with a complete
> > set of init lists that properly describe the power-up state, without
> > the need to rerun diagnostics.
> > 
> > Alternately, FS3M may be implemented as a core O.S. device driver,
> > though in this fashion, it obviously can't speed up the cold boot
> > process.  In either ROM-based or device-driver based form, as
> > execution proceeds, (other) device drivers will call the FS3M's
> > update_init_list(id, list_ptr) function with new initialization
> > information each time the state of a device changes.  Realistically,
> > since the actual machine state is changing every few nanoseconds,
> > some degree of judgement is required to determine when to call
> > update_init_list().  For example, it might be overkill to call
> > update_init_list() every time the cursor is updated.  If so, it's
> > possible to special-case high frequency state changes such as this
> > one by polling the device state and executing the relevant
> > update_init_list() call as a component of the suspend code.
> > However, use this technique sparingly, or else entry into S3 becomes
> > too slow.  In any event, each device driver may select the update
> > rate which is most appropriate for that device.
> > 
> > With the exception of the special cases noted above, FS3M always has
> > a complete record of the current state of the machine, so entry into
> > S3 is extremely fast.  Upon resume, FS3M's init list manager uses
> > this state information to restore the entire state of the machine as
> > quickly as possible, in a manner that is completely transparent to
> > all other device drivers.  In fact, if an external entry point is
> > supported, it's essentially possible for FS3M to suspend and resume
> > transparently to the operating system.  As suggested in the "Band
> > Aids and Duct Tape" section, some of our key questions relate to the
> > degree to which we should hide these fast S3 transitions, or whether
> > it would be better to integrate them with the current Linux power
> > management architecture.  Feedback from the community is
> > particularly important regarding this decision.
> > 
> > 
> > Summary:
> > 
> > Power management is critically important to the success of the One
> > Laptop Per Child project.  Within this paper, we've presented both
> > straightforward extensions to the current power management
> > infrastructure, as well as a more radical approach that focuses on
> > higher performance.  We seek the involvement of the Open Source
> > community to find the best solution for our machine - please let us
> > know what you think!
> > 
> > 
> > 
> > 
> > Appendix A - Initialization Lists:
> > 
> > As mentioned previously, FS3M is envisioned to use a sorted array of
> > initialization lists.  While there are many other ways of
> > implementing this functionality, the following section suggests one
> > possible approach for structuring the data that's needed to
> > efficiently record and restore device state - it is only included
> > here to show the relative simplicity and compactness of the
> > list-based approach.
> > 
> > In this example, each one of the FS3M initialization lists would
> > contain a sequence of <type>,<address>,<data> tuplets.  Each
> > specific list will nominally contain the I/O and memory cycles that
> > are necessary to initialize a specific device, where a "device"
> > matches the device driver's definition of one device.  The <type>
> > field would be bit-mapped to define:
> >       * The address space: memory or I/O
> >       * The data width: BYTE/WORD/DWORD/QWORD
> >       * Command type, including: Set bits, Clear bits, Write bits,
> >         Read bits, Delay, Exec subroutine, or End list.
> >       * Length field.  On Writes, this field is used to support
> >         block writes.  During Read or Exec commands, the length
> >         field is cast as a delay/timeout field.  The length field is
> >         ignored during the read-modify-write commands of Set bits
> >         and Clear bits, and during the End list command.  To improve
> >         source code clarity, a length value of 0 is equivalent to
> >         specifying a value of 1.
> > As an example, a list for initializing a simplistic XGA CRT
> > controller might appear something like this:
> >         {{IO_WORD_SET, CRT_MODE_REG, CRT_RESET | CRT_BLANKING},
> >          {IO_WORD_CLEAR, CRT_MODE_REG, CRT_RESET},
> >          {IO_WORD_WRITE, CRT_HSTART_REG, 0},
> >          {IO_WORD_WRITE, CRT_HSIZE_REG, 1024},
> >          ...
> >          {MEM_QWORD_WRITE | CRT_QWORDS, frame_buffer_ptr, 0},
> >          {IO_WORD_CLEAR, CRT_MODE_REG, CRT_BLANKING},
> >          {INIT_END_LIST, null, 0}};
> > Simple enough so far.  All we've done is to replace I/O and memory
> > initialization code with a data table.  Sometimes, though, it's
> > necessary to poll a status bit during device initialization.  In
> > this case, when a list entry is a "Read bits" command, two data
> > structure changes occur.  First, the initial length field is
> > interpreted as a timeout value.  In addition, the subsequent list
> > entry is recast as an
> > <error_routine_parameter>,<error_routine_addr>,<bit_mask> tuplet.
> > 
> > Let's imagine that the CRT's status port contains an active high
> > CRT_READY bit, and a low-active CRT_CLOCK_LOCKED bit, and we want a
> > 5 ms timeout - if the proper status is not returned within 5 mS, the
> > list should call crt_error(CRT_INIT_ERROR).  The CRT init list might
> > now look like this:
> >         {{IO_WORD_SET, CRT_MODE_REG, CRT_RESET | CRT_BLANKING},
> >          {IO_WORD_CLEAR, CRT_MODE_REG, CRT_RESET},
> >          {IO_WORD_WRITE, CRT_HSTART_REG, 0},
> >          {IO_WORD_WRITE, CRT_HSIZE_REG, 1024},
> >          ...
> >          {MEM_QWORD_WRITE | CRT_QWORDS, frame_buffer_ptr, 0},
> >          {IO_BYTE_READ | (5 * MILLISECONDS), CRT_STATUS_REG,
> >         CRT_READY},
> >           {CRT_INIT_ERROR, &crt_error(), CRT_READY |
> >         CRT_CLOCK_LOCKED},
> >          {IO_WORD_CLEAR, CRT_MODE_REG, CRT_BLANKING},
> >          {INIT_END_LIST, null, 0}};
> > Now let's consider the quite common case where the reset pulse must
> > be held active for a significant amount of time.  In that case,
> > we'll use the INIT_DELAY type to insert the necessary delay.  As
> > with reads, the length component is interpreted as a timeout.
> > 
> > Let's assume that this is a really crappy CRT controller that only
> > polls(!) for RESET every 10 mS.  Ignoring the fact that it ought to
> > be thrown in the nearest trash heap, the start of the list would now
> > appear like this:
> >         {{IO_WORD_SET, CRT_MODE_REG, CRT_RESET | CRT_BLANKING},
> >          {INIT_DELAY | (10 * MILLISECONDS), null, 0},
> >          {IO_WORD_CLEAR, CRT_MODE_REG, CRT_RESET},
> >          ...
> > Hopefully, it won't be necessary to hard-code too many fixed delays
> > like this one.  If this same controller has a CRT_RESET_ACK status
> > bit that acknowledges the reset command, the following list segment
> > is much better:
> >         {{IO_WORD_SET, CRT_MODE_REG, CRT_RESET | CRT_BLANKING},
> >          {IO_BYTE_READ | (10 * MILLISECONDS), CRT_STATUS_REG,
> >         CRT_RESET_ACK},
> >           {CRT_RESET_ERROR, &crt_error(), CRT_RESET_ACK},
> >          {IO_WORD_CLEAR, CRT_MODE_REG, CRT_RESET},
> >          ...
> > The final list type of "Exec" is used as a catch-all for
> > initialization sequences that are too complex to encode in the list.
> > It is important to note that this facility should only be used if
> > necessary, and for a bare minimum of functionality, since Exec
> > prevents parallelization while a subroutine is executing.  It is
> > always better to avoid its use by creative list encoding, when it's
> > possible to do so.
> > 
> > However, let's imagine that we discover that our goofy CRT
> > controller has a bug, and has only been correctly initialized if the
> > CRT_READY bit is high or the CRT_CLOCK_LOCKED bit is high, but not
> > if both are high.  In that case, the INIT_EXEC command will be used
> > to execute a short "crt_init_ok(delay)" routine, which might look
> > something like this:
> >         int crt_init_ok(WORD delay) {
> >             BYTE stat = input_byte(CRT_STATUS_REG);
> >             if (((stat & CRT_READY) != 0) ^
> >                 ((stat & CRT_CLOCK_LOCKED) != 0)) {
> >                 return(INIT_EXEC_DONE);       /* Go to next list
> >         element */
> >             }
> >             else if (delay == 0) {            /* Timeout flagged by
> >         FS3M */
> >                crt_error(CRT_INIT_ERROR);     /* Handle errors if
> >         needed */
> >                return(INIT_EXEC_EXIT_LIST);   /* Exit current init
> >         list */
> >         //or:  return(INIT_EXEC_RETRY_LIST);  /* Restart this init
> >         list */
> >             }
> >             else return(INIT_EXEC_RETRY);     /* Repeat the call
> >         again */
> >         }
> > As suggested, the INIT_EXEC construct supports bidirectional
> > communications between the list manager and external native
> > subroutines.  When it calls an external routine, the list manager
> > passes in the current delay value, which will steadily count down
> > with each invocation of the routine until reaching zero, or the
> > routine's return code tells the list manager to move on.  Each time
> > a subroutine executes, it has four possible states that it can
> > communicate back to the list manager, allowing for a limited degree
> > of error handling:
> >       * INIT_EXEC_DONE flags the list manager that this command has
> >         successfully completed, and that the next element of the
> >         init list should be executed.
> >       * INIT_EXEC_RETRY indicates that the command should be
> >         repeated.
> >       * INIT_EXEC_RETRY_LIST is used to attempt reinitialization of
> >         a device by reexecuting the entire contents of the current
> >         initialization list, starting with the first item in the
> >         list.
> >       * INIT_EXEC_EXIT_LIST is used whan initialization has failed
> >         completely.  The remainder of the current initialization
> >         list will be ignored.
> > Continuing with the prior example, once the crt_init_ok() routine
> > exists, the init list becomes:
> >         {{IO_WORD_SET, CRT_MODE_REG, CRT_RESET | CRT_BLANKING},
> >          {IO_BYTE_READ | (10 * MILLISECONDS), CRT_STATUS_REG,
> >         CRT_RESET_ACK},
> >           {CRT_RESET_ERROR, &crt_error(), CRT_RESET_ACK},
> >          {IO_WORD_CLEAR, CRT_MODE_REG, CRT_RESET},
> >          {IO_WORD_WRITE, CRT_HSTART_REG, 0},
> >          {IO_WORD_WRITE, CRT_HSIZE_REG, 1024},
> >          ...
> >          {MEM_QWORD_WRITE | CRT_QWORDS, frame_buffer_ptr, 0},
> >          {INIT_EXEC | (10 * MILLISECONDS), &crt_init_ok(), 0),
> >          {IO_WORD_CLEAR, CRT_MODE_REG, CRT_BLANKING},
> >          {INIT_END_LIST, null, 0)};
> > It's worth pointing out that the INIT_DELAY and INIT_EXEC commands
> > are not the same!  Though the structure of an INIT_DELAY may appear
> > identical to an INIT_EXEC with a null routine pointer, there is an
> > important distinction that is used for greater control of
> > parallelization.  During an INIT_DELAY call, parallelization is
> > immediately enabled.  However, the INIT_EXEC call does not enable
> > parallelization on the first invocation of the target subroutine.
> > This makes implementing "critical sections" straightforward.
> > However, when an exec subroutine loops (which occurs when the
> > subroutine returns an INIT_EXEC_RETRY state), parallelization is
> > enabled in-between subsequent calls to the routine.  Therefore,
> > other hardware may simultaneously be initialized while the loop
> > continues to spin.  Conversely, when a small delay is necessary and
> > parallelization is undesirable, simply perform an INIT_EXEC with a
> > null subroutine pointer.
> > 
> > 
> > 
> > 
> > Appendix B - Common Questions about Initialization Lists:
> > 
> > Q: Why go to all this work?
> > A: It actually isn't difficult to implement the list manager, and
> > once you've become accustomed to the technique, initialization lists
> > are very easy to create and maintain.
> > 
> > Q: Why isn't If/Then/Else functionality directly supported?
> > A: Because an initialization list should encode a specific state.
> > Instead of adding an If/Then/Else clause, the list should instead
> > encode the precise outputs to implement the current state.  If the
> > state changes such that different I/Os are necessary, a different
> > list should be encoded.  It's also easy to split init lists into
> > multiple sequential init lists if only a few differences are present
> > in the init lists for different states.
> > 
> > Q: What about size?
> > A: The initialization lists are typically much smaller than an
> > equivalent code sequence.  On machines with restricted resources,
> > this can result in significant space savings.
> > 
> > Q: Isn't this slow?
> > A: Quite the opposite.  By compactly encoding device states, we
> > avoid calling many different device drivers, checking current device
> > states, etc.  Instead, the list manager blasts through the
> > initialization lists very quickly.  In fact, on many system
> > architectures, the cycle time for I/O writes is sufficiently long
> > that the list manager can often saturate the bus.  Since I/O writes
> > dominate most init lists, the execution time on such machines is
> > nearly optimal.  When parallelization is exploited via careful list
> > optimization, elapsed clock time can be orders of magnitude faster
> > than traditional approaches.
> > 
> > Q: Why not stick with code instead of the data list approach?
> > A: As above, using code in the current O.S. structure requires
> > sequentially executing each driver.  No parallelization across
> > devices can occur, leading to unnecessarily slow initialization
> > times.
> > 
> > Q: Are there any gotchas?
> > A: Yes.  Using a list manager that is capable of parallelization
> > requires a bit more thought than conventional single-threaded init
> > code.  In particular, you'll want to use caution when multiple
> > devices share a common I/O port, such as an interrupt controller.
> > Use of Set/Clear instead of Write, and simple planning regarding the
> > insertion of parallelization-enabling Delays or Exec loops are easy
> > ways to prevent problems.  
> > Q: Supporting parallelization is too much work for me.
> > A: Try it before you toss it!  Still, if parallelization is
> > absolutely undesirable in a specific application, it's trivial to
> > use the list ID to force sequential execution of each list.  As long
> > as lists don't share the same major ID number, no parallelization
> > will occur.
> > 
> > Q: Where did this stuff come from?
> > A: Variants of init lists have been used for decades.  This specific
> > approach has its origins in techniques that were developed by Mark
> > Foster at Zenith Data Systems more than 20 years ago - they were
> > used to significantly improve structure and maintainability in that
> > firm's PC BIOS.
> > 
> > Q: Is this useful for anything other than this specialized power
> > management?
> > A: Absolutely!  When integrated into the boot ROM or BIOS, the
> > initialization list approach can significantly speed up the cold
> > boot process - just execute simple diagnostics routines from the
> > init lists when required.  Furthermore, the exact same list manager
> > and data structures may also be used to support fast resume when
> > desired.
> > 
> > 
> > 
> > Sincerely,
> > Mark J. Foster                         &   Jim Gettys
> > VP Engineering/Chief System Architect      VP Software Engineering
> > mfoster at laptop.org                         jg at laptop.org
> > 
> >                              One Laptop Per Child
> 
-- 
Jim Gettys
One Laptop Per Child