[OLPC-devel] Thoughts on S3 resume.

Mark J. Foster mfoster at laptop.org
Sat Aug 5 13:12:12 EDT 2006


Hi, Folks!

Fast resume from S3 isn't at all a trivial issue.  The following is a 
brainstorming report that Jim and I wrote up back in April to help stir 
the pot.  In essence, the challenge is how to reinitialize all of the 
system devices which have been powered-down as quickly as possible.  In 
an effort to push the envelope, we've hypothesized one possible approach 
that could maximize resume speed.  Note that actually implementing this 
would be anything other than trivial due to the fact that it touches 
"everything", and even more important, we don't even know if it is 
necessary!  It is quite possible that the current resume infrastructure 
is fast enough not to need all of this work.  Nonetheless, what this 
white paper does do is to help inform folks of some of the challenges 
associated with trying to enter S3 after perhaps one second of system 
inactivity...i.e. orders of magnitude faster than current systems 
attempt to do.

Please accept my apologies for the HTML content.

Cheers!
MarkF

> Rapid S3: A Brainstorming Proposal
>
> Introduction:
>
> The One Laptop Per Child (OLPC) initiative must utilize very 
> aggressive power management solutions in order to deliver the most 
> cost-effective solution for our "$100 laptop."  A significant fraction 
> of school age children lack power at home, most lack power in their 
> schools and virtually none have it at their desk.  Moreover, when 
> power is available, it is often unreliable.  Human-powered machines 
> which do not require AC line power are therefore a necessity, rather 
> than a luxury.  To this end, our laptop must be highly 
> power-efficient, with an average power consumption target of less than 
> 2 watts, versus 10-30 watts in traditional machines.  Power management 
> is key.
>
> For various reasons, our first-generation power management 
> architecture will focus on extremely fast entry into, and exit from, 
> "Suspend To RAM" (known as "S3") mode - this is made possible via an 
> intelligent display controller that can continue to refresh the 
> display while the CPU is shut down.  Additionally, the chosen wireless 
> chip can continue to forward packets autonomously during suspend, with 
> the capability of waking the primary processor when required.  
> Further, since the system is Flash-based, there are no disk or disk 
> controller latencies to slow the process.
>
> Using S3 is certainly not unusual, but what is unique is our goal of 
> utilizing S3 with sub-second timeframes.  While good profiling data is 
> currently lacking, a number of embedded Linux systems (e.g. the HP 
> iPAQ and Nokia 770), can suspend on the order of a second, and resume 
> in a fraction of a second. The current asymmetry in these device's 
> behavior is not currently understood. Investigation has just begun.
>
> Specifically, we'll actually be entering and exiting from S3 
> in-between keystrokes whenever there is a significant pause, similar 
> to the use of "stop-clock" mode on conventional notebooks.  This is a 
> challenging goal, yet one which we believe is attainable.  
> Accomplishing this task will, however, require the cooperation of many 
> members of the Open Source community.  In order to help define the 
> best solution, this document is intended to solicit feedback from 
> those folks who are most knowledgeable about the relevant code 
> subsystems, including the Linux kernel and device drivers, the Linux 
> ACPI interface, the X Window System system architecture and drivers, 
> and the LinuxBIOS.
>
>
> First steps:
>
> Our first task will be to profile X Window System activity in order to 
> better quantify user behavior - this will help to determine the proper 
> delay before initiating suspend requests.  In addition, we'll need to 
> instrument the suspend/resume cycle and understand the current 
> asymmetry between suspend and resume on the systems that currently 
> complete the suspend/resume cycle most quickly. We do not understand 
> why, on the systems we've examined, suspend is so slow. Our current 
> best theory is that people care more about wakeup than suspend, and 
> that there is just "something dumb" going on during suspend. Anyone 
> with suitable instrumentation to create such profile information 
> (particularly for an X86 implementation) is encouraged to make the 
> information available to the community. 
>  
>
> Facts vs. Guesses:
>
> In many ways, this document is admittedly premature, since the OLPC 
> team does not (yet) have good measurement statistics for the time 
> durations that each of the relevant code subsystems require to enter 
> and exit S3.  Casual observation would suggest that most S3 
> implementations greatly exceed the necessary timeframes, but detailed 
> profile information for these transitions would be enormously 
> valuable, and would be greatly appreciated! 
>
> In the absence of suitable profiling data, the remainder of this 
> proposal makes the (quite possibly inappropriate) assumption that 
> existing code and interfaces may not be suitable to achieve the 
> performance that's necessary for the OLPC machine.  We therefore 
> recklessly propose some possible implementations, not because they're 
> optimal (or even rational), but simply to serve as a convenient means 
> of documenting the performance characteristics that we seek, and to 
> serve as a "brainstorming seed" for those who know better!  
> Throughout, a list of specific code areas that are believed to be 
> potentially problematic are highlighted.  Feedback on any or all of 
> these issues will be very much appreciated by the OLPC team!
>
>
> Band-Aids and Duct Tape:
>
> Sadly, even though the OLPC laptop is a non-profit project, we're 
> still burdened with the egregious spectre of a tight schedule.  
> Despite our best wishes, we can't realistically aim for the ultimate 
> in structural enhancements, such as system-wide device driver 
> structures that provide parsable, completely generic device subsystem 
> interrelationship information, nor do we have the luxury of 
> significant changes to the system's existing hardware architecture.  
> Consequently, we, like the rest of the world, seek a happy (or at 
> least not entirely depressing) medium.
>
> As one possibility, what about lying?  To what extent might the 
> inherent structural challenges be ameliorated by claiming that we're 
> entering S1 (stop-clock) state, instead of S3 (suspend) state, while 
> playing a few device driver games?  What hidden demons might lurk 
> within such a disreputable approach?  Alternately, might it be best to 
> just live in a fantasy/dream state, where we mystically maintain and 
> restore device state information in such a manner as to tell ACPI 
> little to nothing (i.e. far less than it expects)?
>
>
> The Goal:
>
> Our target response to an input event (keyboard or mouse) is 200 mS, 
> for that results in a minimally perceivable response delay.  CHI 
> research over the years shows that humans are less sensitive to 
> keyboard delays than other delays (e.g. rubber banding or dragging 
> operations in user interfaces). Due to the nature of the AMD GX2 CPU 
> platform at the core of the machine, 150 mS of this time are already 
> required for the CPU's reset delay, so only 50 mS are available for 
> code execution to satisfy the 200 mS target.  At this latency, 
> research suggests the delay will be perceptible, but not highly 
> objectionable.  Therefore, the specific goal for exiting S3 is 50 mS.  
> This reflects the total amount of elapsed clock time from the CPU 
> fetching the reset vector until the kernel hands off control to a 
> user-space application.  In the future, on hardware platforms that are 
> explicitly optimized for fast S3, significantly lower targets would be 
> desirable.
>
> Entry into S3 is also a challenge, given fundamental requirements such 
> as activity detection.  Fortunately, the hardware component of entry 
> into S3 should only be on the order of 10 mS.  In the absence of hard 
> data, we'll arbitrarily pick a goal for entry into S3 of 100 mS.  This 
> would yield an effective clock time of 90 mS available for activity 
> detection timeout, saving the state of the machine into RAM, and 
> execution of the S3 transition code. In the future, we'd like to be 
> able to make such a transition in as little as 20 milliseconds, at 
> which point the delays will usually be under human perception. So for 
> our initial goal, we're shooting for five times the ideal.
>
>
> Secret Sauce:
>
> It's important to note a couple of important factors that 
> significantly assist in meeting these performance goals.  First, the 
> OLPC machine utilizes a NAND Flash-based JFFS2 filesystem, so 
> conventional IDE drives are not an issue.  Second, as suggested 
> previously, it is not necessary to save or restore the state of the 
> display.  A custom ASIC provides continuous screen refresh whether or 
> not the CPU is powered-on, even though the OLPC machine's GX2 
> processor utilizes a Unified Memory Architecture for the frame 
> buffer.  Therefore, low-level video code will not be a significant 
> factor (further details regarding this ASIC are beyond the scope of 
> this document).
> In addition, the Marvell wireless chip allows us to have wireless alive and able to wake the processor when packets are addressed to it (while continuing to forward packets in the mesh autonomously).
>
> Problems:
>
> Today, many drivers utilize somewhat arbitrary hardware wake-up 
> delays, sometimes for good reasons, and sometimes for bad reasons.  
> Fortunately, our machine lacks some of the "features" found on most 
> laptops that aggravate the problem (e.g. IDE controllers and disk 
> devices).  We will obviously tune the delays in the drivers for the 
> specific instance of hardware we have, and as we plan to use 
> LinuxBIOS, we are free to work at all layers of the stack.
>
> Presuming that we cannot meet the above goals "out of the box", we 
> have several alternatives:
>
>
> Solution 1:
> -----------
>
> Whenever a significant delay in the driver is reached, modify the 
> power management system to enable working on the state transitions in 
> the drivers on other hardware.  Obviously, this will have to respect 
> bus and device topology in the usual fashion. With luck, this *might* 
> get us to our initial goal; we wonder, however, if it can get to our 
> longer term ideal.
>
>
> Straightforward... but?:
>
> The most obvious implementation method would be to use the existing 
> ACPI structure to implement the OLPC power architecture.  However, we 
> are concerned, based solely on user-level observation of existing 
> implementations, that this may be insufficient for the task.  At the 
> macro level, we observe seconds-level delays during most transitions 
> into S3.  Potentially problematic components of this delay may include:
>
>    1. Activity detection.  It may well be that current implementations
>       wait for a substantial window of time for process execution to
>       complete.  If so, a more efficient means of activity detection
>       is necessary.
>    2. Serial execution.  The current structure would seem to require
>       serial execution of the suspend code in each of the system's
>       device drivers, even though it is theoretically possible to
>       parallelize this execution in most cases (obvious exceptions to
>       parallelization include drivers for bus bridges, etc).
>    3. Conservative device drivers.  In order to maximize portability,
>       it is quite common to see delays on the order of hundreds of
>       milliseconds in device driver code to ensure device inactivity. 
>       While this is quite reasonable given the current usage of S3, it
>       will be necessary to conform much more closely to data sheet
>       minimums on the OLPC machine to deliver the necessary
>       performance targets.
>    4. Sequential delays.  Due to items (2) and (3), substantial time
>       appears to be wasted on delays that might otherwise be
>       concurrent.  For example, if there are 10 devices in the system,
>       each of which require delays of 20mS, the current structure
>       would seem to mandate 200 mS of clock time for concatenated
>       delays, when a single 20 mS parallel delay should be sufficient.
>    5. Unnecessarily complex state capture.  Hardware isn't always
>       friendly, and many drivers have to wind up doing strange or
>       unusual tricks to record the current state of a device.
>    6. Assumptions of PC compatibility requirements - closely related
>       to item (6).  Thank goodness, much less of this is seen on Linux
>       than with other platforms, but device drivers which originate on
>       the PC platform often inherit the habit of recovering the
>       current device state from hardware, even though it would be far
>       faster to maintain state in the device driver itself.
>    7. Unnecessary state saves.  As before, there is no need to save
>       the frame buffer contents of the OLPC machine.  In a similar
>       vein, it is unnecessary to save the state of the OLPC's wireless
>       network, for it stays alive (thanks to an on-chip CPU and
>       embedded RAM) even if the primary CPU is powered off.
>
> On highly portable machines which have received special optimization, 
> exit from S3 is considerably faster than entering S3, but would still 
> appear to be considerably slower than is needed for the OLPC machine.  
> Components of this delay might include:
>
>    1. Assumed "power stabilization" delays.  There are, in fact, many
>       machines which do require hard delays before they are fully
>       functional.  On the OLPC machine, no such delay will be
>       necessary.  Once execution begins, full speed ahead!
>    2. Serial execution.  As with suspend requests, it would appear
>       that sequential driver execution is mandated during system
>       resume, despite the fact that most drivers could be executed
>       concurrently, if sufficient information were somehow made
>       available regarding the structure of, and interaction between,
>       device subsystems.
>    3. Conservative device drivers.  During resumes / exit from S3,
>       device driver delays are even more common, and more prolonged. 
>       Again, this is noteworthy in pursuit of maximum portability, but
>       is problematic when high-performance is required.
>    4. Sequential delays.  Ditto.  Initialization / reset delays are an
>       even greater concern than suspend delays.
>    5. Unnecessary delays.  In cases such as IDE initialization, it's
>       very common to see extremely long delays for device spin-up.  In
>       a machine with no rotating drives, this is clearly irrelevant.
>    6. Polling for nonexistent devices.  As a restricted machine with
>       predefined peripherals, time spent searching for peripherals
>       that "might exist" on more expandable machines is an expensive
>       waste.  The only exception is for the OLPC machine's three USB
>       2.0 ports, which will undoubtedly require special treatment.
>    7. Unnecessary state restores.  Again, it is unnecessary to restore
>       the state of the display or wireless subsystems.  Today, tens or
>       hundreds of milliseconds are often spent repainting the screen
>       after a resume, even though this is unnecessary (and very
>       undesirable) on the OLPC machine.  Handling applications which
>       perform "frivolous" writes (such as on-screen clocks or even
>       worse, animated GIFs) will require a bit more care.  Handling
>       wireless resume in a typical fashion would be a gargantuan
>       mistake, since repeating a DHCP request, and reinitializing the
>       TCP/IP stack once or twice each second would be quite
>       problematic, to say the least!
>
>
> Solution 2:
> -----------
>
> To help provide a more concrete basis for discussion and flames, we'll 
> next discuss one possible technique for achieving very fast S3 
> transitions.  To do this, we'll hypothesize a new core "Fast S3 
> Management" (FS3M) driver.  Located either within the system's BIOS, 
> or as a standard Linux device driver, FS3M's primary goals include:
>
>     * Serving as a central repository for a machine's current state  -
>       including the state of /all/ devices.
>     * Providing an efficient interface which allows device drivers to
>       "register" the current device state whenever a state change occurs.
>     * Delivering an efficient means of rapidly restoring a machine to
>       a previously recorded state.
>     * Creating a lightweight method for enabling the parallelization
>       of device initialization / state restoration in a
>       straightforward, easy to control manner.
>     * Utilizing generic data structures that compactly encode device
>       and machine state.
>
> FS3M as envisioned here would contain a sorted array of 
> "initialization lists" (or "init lists").  Conceptually, an init list 
> is simply a list of memory or I/O ports to be written, which also 
> contains the data to be written to each port.  A single init list may 
> contain all of a device's state, or only a portion of a device's 
> state, however it's easiest to think of one init list as containing 
> the state of a single device, and as therefore providing all of the 
> information that's necessary to restore the state of that device.
>
> Throughout this document, the terms "restore" and "initialize" are 
> essentially used interchangeably.  This is intentional.  A properly 
> defined init list should be capable of initializing a device to a 
> given state at any point in time.  At power-up, we can think of this 
> process as initialization, but since the initialization is to a 
> previously recorded state, it is identical to "restoring" that 
> previous state.
>
> In the real world, device initialization often requires operations 
> that are more complex than basic I/O or memory writes.  It is 
> therefore important to be able to encode a richer set of information 
> that better satisfies the needs of most devices.  Rather than abandon 
> the high-level picture to elaborate at this point, Appendix A contains 
> a reasonably detailed proposal which identifies the broader set of 
> capabilities that FS3M's init lists should support.  For now, just 
> remember that init lists are simply lists that encode device state.
>
> As a machine contains multiple devices, a machine's state may 
> therefore be encoded via multiple init lists.  A key problem, however, 
> is identifying a means of organizing multiple init lists in a manner 
> which identifies the actual dependencies of one device upon another: 
> memory controllers must be initialized before memory may be written, 
> bus bridges must be initialized before devices that are located on 
> that bus can be initialized, etc.  To restate the issue, as long as 
> the collection of init lists are executed in the proper sequence, it 
> will be possible to deterministically initialize a machine to a 
> specific known state - i.e. to restore it to that state.
>
> Due to the enormous variety of system architectures, it is not 
> generally possible to automatically determine the proper location of a 
> device within a machine's device heirarchy.  Therefore, FS3M relies on 
> the programmer to provide simple ID numbers that specify how a 
> specific init list relates to other init lists.  Further, this ID 
> information is used to identify when multiple devices are at the same 
> point in the device heirarchy.  This specific relationship is quite 
> important in FS3M, for devices with the same position in the heirarchy 
> may be initialized in parallel, which can dramatically accelerate the 
> process of restoring a machine's state.
>
> First, before proceeding, it's important to emphasize that, while a 
> conventional hardware device heirarchy and the init list heirarchy are 
> similar, they are not identical.  FS3M's heirarchy is solely 
> determined by the potential for simultaneous initialization of 
> different init lists -- it is the programmer's responsibility to 
> ensure that init lists which cannot be executed in parallel use IDs 
> that prevent them from being so executed.
>
> FS3M's heirarchy is extremely simple.  Each init list has two ID 
> values: a "major ID" and a "minor ID".  Those init lists with 
> identical major IDs are considered to be at the same point in the 
> heirarchy, and the minor ID value simply differentiates between lists 
> for administrative purposes (such as knowing when a new init 
> list/state should replace an existing init list/state).  To illustrate 
> how this heirarchy may differ from a normal hardware heirarchy, 
> devices at completely different levels within the hardware heirarchy 
> may share the same major ID: for instance, a PCI bridge and an on-chip 
> video controller might share the same heirarchy level as an init list 
> which encodes the state of a status display.
>
> This structure leads to a simple conclusion: the most important part 
> of using init lists is the assignment of ID numbers.  At one end of 
> the spectrum, assigning every init list with the same major ID will 
> lead to a complete mess, as devices which interrelate are initialized 
> at the same time as the bus bridges which connect them to the CPU.  At 
> the other extreme, assigning a different major ID to each init list 
> will prevent any opportunity for parallelization, leading to 
> unnecessarily slow initialization.  You might wish to start this way 
> for simplicity during initial debugging, but this should be optimized 
> once the initialization values are known to be correct.  In the 
> authors' opinion, the easiest method of determining the proper 
> heirarchy level is to visually graph the different init lists in the 
> system, and to rearrange them until a reasonably efficient heirarchy 
> is determined.
>
> A further level of optimization can also be quite beneficial when high 
> performance is required: splitting init lists.  Consider the case 
> where two devices are located at the same point in the heirarchy, but 
> one device requires only 50 uS to initialize, while the second device 
> requires a 10mS delay during its initialization sequence.  Since FS3M 
> cannot proceed to the next level in the heirarchy until all devices in 
> the current level have been initialized, this case would result in an 
> unnecessary ~9.95 mS delay.  To resolve this, it is relatively 
> straightforward to split the init list that requires the long delay 
> into two components: one list that performs the first section of the 
> initialization prior to the delay, and a second list that performs the 
> remainder of the initialization at a different heirarchy level.  
> However, it is important to note a significant caveat when splitting 
> init lists.  As currently envisioned, FS3M will not include provisions 
> for relating init lists with different heirarchy levels by time.  In 
> other words, when removing a long delay from an init list and 
> splitting the list at the delay point, it is the programmer's 
> responsibility to ensure that execution of intermediate init lists at 
> lower heirarchy levels will inherently provide sufficient delay until 
> the second component of the device's init list is executed.  If not 
> thought through, this could make maintenance more challenging.  Once 
> again, careful assignment of IDs is key to delivering the highest 
> performance from init lists.
>
>
> FS3M In Operation:
>
> If FS3M were completely integrated into a system, it would be 
> incorporated into the system's boot ROM/BIOS, and that ROM would 
> contain hard-coded default init lists that bring up the machine to its 
> power-up state.  Some of these init lists might be quite short - even 
> two-lines long - these would typically be lists that execute suitable 
> native language machine diagnostics at the appropriate points in 
> time.  Just before executing the system's boot loader, the ROM would 
> inform FS3M (via delete_init_list(id) calls) to remove the 
> diagnostics-related init lists, thus leaving FS3M with a complete set 
> of init lists that properly describe the power-up state, without the 
> need to rerun diagnostics.
>
> Alternately, FS3M may be implemented as a core O.S. device driver, 
> though in this fashion, it obviously can't speed up the cold boot 
> process.  In either ROM-based or device-driver based form, as 
> execution proceeds, (other) device drivers will call the FS3M's 
> update_init_list(id, list_ptr) function with new initialization 
> information each time the state of a device changes.  Realistically, 
> since the actual machine state is changing every few nanoseconds, some 
> degree of judgement is required to determine when to call 
> update_init_list().  For example, it might be overkill to call 
> update_init_list() every time the cursor is updated.  If so, it's 
> possible to special-case high frequency state changes such as this one 
> by polling the device state and executing the relevant 
> update_init_list() call as a component of the suspend code.  However, 
> use this technique sparingly, or else entry into S3 becomes too slow.  
> In any event, each device driver may select the update rate which is 
> most appropriate for that device.
>
> With the exception of the special cases noted above, FS3M always has a 
> complete record of the current state of the machine, so entry into S3 
> is extremely fast.  Upon resume, FS3M's init list manager uses this 
> state information to restore the entire state of the machine as 
> quickly as possible, in a manner that is completely transparent to all 
> other device drivers.  In fact, if an external entry point is 
> supported, it's essentially possible for FS3M to suspend and resume 
> transparently to the operating system.  As suggested in the "Band Aids 
> and Duct Tape" section, some of our key questions relate to the degree 
> to which we should hide these fast S3 transitions, or whether it would 
> be better to integrate them with the current Linux power management 
> architecture.  Feedback from the community is particularly important 
> regarding this decision.
>
>
> Summary:
>
> Power management is critically important to the success of the One 
> Laptop Per Child project.  Within this paper, we've presented both 
> straightforward extensions to the current power management 
> infrastructure, as well as a more radical approach that focuses on 
> higher performance.  We seek the involvement of the Open Source 
> community to find the best solution for our machine - please let us 
> know what you think!
>
>
>
>
> Appendix A - Initialization Lists:
>
> As mentioned previously, FS3M is envisioned to use a sorted array of 
> initialization lists.  While there are many other ways of implementing 
> this functionality, the following section suggests one possible 
> approach for structuring the data that's needed to efficiently record 
> and restore device state - it is only included here to show the 
> relative simplicity and compactness of the list-based approach.
>
> In this example, each one of the FS3M initialization lists would 
> contain a sequence of <type>,<address>,<data> tuplets.  Each specific 
> list will nominally contain the I/O and memory cycles that are 
> necessary to initialize a specific device, where a "device" matches 
> the device driver's definition of one device.  The <type> field would 
> be bit-mapped to define:
>
>     * The address space: memory or I/O
>     * The data width: BYTE/WORD/DWORD/QWORD
>     * Command type, including: Set bits, Clear bits, Write bits, Read
>       bits, Delay, Exec subroutine, or End list.
>     * Length field.  On Writes, this field is used to support block
>       writes.  During Read or Exec commands, the length field is cast
>       as a delay/timeout field.  The length field is ignored during
>       the read-modify-write commands of Set bits and Clear bits, and
>       during the End list command.  To improve source code clarity, a
>       length value of 0 is equivalent to specifying a value of 1.
>
> As an example, a list for initializing a simplistic XGA CRT controller 
> might appear something like this:
>
>     {{IO_WORD_SET, CRT_MODE_REG, CRT_RESET | CRT_BLANKING},
>      {IO_WORD_CLEAR, CRT_MODE_REG, CRT_RESET},
>      {IO_WORD_WRITE, CRT_HSTART_REG, 0},
>      {IO_WORD_WRITE, CRT_HSIZE_REG, 1024},
>      ...
>      {MEM_QWORD_WRITE | CRT_QWORDS, frame_buffer_ptr, 0},
>      {IO_WORD_CLEAR, CRT_MODE_REG, CRT_BLANKING},
>      {INIT_END_LIST, null, 0}};
>
> Simple enough so far.  All we've done is to replace I/O and memory 
> initialization code with a data table.  Sometimes, though, it's 
> necessary to poll a status bit during device initialization.  In this 
> case, when a list entry is a "Read bits" command, two data structure 
> changes occur.  First, the initial length field is interpreted as a 
> timeout value.  In addition, the subsequent list entry is recast as an 
> <error_routine_parameter>,<error_routine_addr>,<bit_mask> tuplet.
>
> Let's imagine that the CRT's status port contains an active high 
> CRT_READY bit, and a low-active CRT_CLOCK_LOCKED bit, and we want a 5 
> ms timeout - if the proper status is not returned within 5 mS, the 
> list should call crt_error(CRT_INIT_ERROR).  The CRT init list might 
> now look like this:
>
>     {{IO_WORD_SET, CRT_MODE_REG, CRT_RESET | CRT_BLANKING},
>      {IO_WORD_CLEAR, CRT_MODE_REG, CRT_RESET},
>      {IO_WORD_WRITE, CRT_HSTART_REG, 0},
>      {IO_WORD_WRITE, CRT_HSIZE_REG, 1024},
>      ...
>      {MEM_QWORD_WRITE | CRT_QWORDS, frame_buffer_ptr, 0},
>      {IO_BYTE_READ | (5 * MILLISECONDS), CRT_STATUS_REG, CRT_READY},
>       {CRT_INIT_ERROR, &crt_error(), CRT_READY | CRT_CLOCK_LOCKED},
>      {IO_WORD_CLEAR, CRT_MODE_REG, CRT_BLANKING},
>      {INIT_END_LIST, null, 0}};
>
> Now let's consider the quite common case where the reset pulse must be 
> held active for a significant amount of time.  In that case, we'll use 
> the INIT_DELAY type to insert the necessary delay.  As with reads, the 
> length component is interpreted as a timeout.
>
> Let's assume that this is a really crappy CRT controller that only 
> polls(!) for RESET every 10 mS.  Ignoring the fact that it ought to be 
> thrown in the nearest trash heap, the start of the list would now 
> appear like this:
>
>     {{IO_WORD_SET, CRT_MODE_REG, CRT_RESET | CRT_BLANKING},
>      {INIT_DELAY | (10 * MILLISECONDS), null, 0},
>      {IO_WORD_CLEAR, CRT_MODE_REG, CRT_RESET},
>      ...
>
> Hopefully, it won't be necessary to hard-code too many fixed delays 
> like this one.  If this same controller has a CRT_RESET_ACK status bit 
> that acknowledges the reset command, the following list segment is 
> much better:
>
>     {{IO_WORD_SET, CRT_MODE_REG, CRT_RESET | CRT_BLANKING},
>      {IO_BYTE_READ | (10 * MILLISECONDS), CRT_STATUS_REG, CRT_RESET_ACK},
>       {CRT_RESET_ERROR, &crt_error(), CRT_RESET_ACK},
>      {IO_WORD_CLEAR, CRT_MODE_REG, CRT_RESET},
>      ...
>
> The final list type of "Exec" is used as a catch-all for 
> initialization sequences that are too complex to encode in the list.  
> It is important to note that this facility should only be used if 
> necessary, and for a bare minimum of functionality, since Exec 
> prevents parallelization while a subroutine is executing.  It is 
> always better to avoid its use by creative list encoding, when it's 
> possible to do so.
>
> However, let's imagine that we discover that our goofy CRT controller 
> has a bug, and has only been correctly initialized if the CRT_READY 
> bit is high or the CRT_CLOCK_LOCKED bit is high, but not if both are 
> high.  In that case, the INIT_EXEC command will be used to execute a 
> short "crt_init_ok(delay)" routine, which might look something like this:
>
>     int crt_init_ok(WORD delay) {
>         BYTE stat = input_byte(CRT_STATUS_REG);
>         if (((stat & CRT_READY) != 0) ^
>             ((stat & CRT_CLOCK_LOCKED) != 0)) {
>             return(INIT_EXEC_DONE);       /* Go to next list element */
>         }
>         else if (delay == 0) {            /* Timeout flagged by FS3M */
>            crt_error(CRT_INIT_ERROR);     /* Handle errors if needed */
>            return(INIT_EXEC_EXIT_LIST);   /* Exit current init list */
>     //or:  return(INIT_EXEC_RETRY_LIST);  /* Restart this init list */
>         }
>         else return(INIT_EXEC_RETRY);     /* Repeat the call again */
>     }
>
> As suggested, the INIT_EXEC construct supports bidirectional 
> communications between the list manager and external native 
> subroutines.  When it calls an external routine, the list manager 
> passes in the current delay value, which will steadily count down with 
> each invocation of the routine until reaching zero, or the routine's 
> return code tells the list manager to move on.  Each time a subroutine 
> executes, it has four possible states that it can communicate back to 
> the list manager, allowing for a limited degree of error handling:
>
>     * INIT_EXEC_DONE flags the list manager that this command has
>       successfully completed, and that the next element of the init
>       list should be executed.
>     * INIT_EXEC_RETRY indicates that the command should be repeated.
>     * INIT_EXEC_RETRY_LIST is used to attempt reinitialization of a
>       device by reexecuting the entire contents of the current
>       initialization list, starting with the first item in the list.
>     * INIT_EXEC_EXIT_LIST is used whan initialization has failed
>       completely.  The remainder of the current initialization list
>       will be ignored.
>
> Continuing with the prior example, once the crt_init_ok() routine 
> exists, the init list becomes:
>
>     {{IO_WORD_SET, CRT_MODE_REG, CRT_RESET | CRT_BLANKING},
>      {IO_BYTE_READ | (10 * MILLISECONDS), CRT_STATUS_REG, CRT_RESET_ACK},
>       {CRT_RESET_ERROR, &crt_error(), CRT_RESET_ACK},
>      {IO_WORD_CLEAR, CRT_MODE_REG, CRT_RESET},
>      {IO_WORD_WRITE, CRT_HSTART_REG, 0},
>      {IO_WORD_WRITE, CRT_HSIZE_REG, 1024},
>      ...
>      {MEM_QWORD_WRITE | CRT_QWORDS, frame_buffer_ptr, 0},
>      {INIT_EXEC | (10 * MILLISECONDS), &crt_init_ok(), 0),
>      {IO_WORD_CLEAR, CRT_MODE_REG, CRT_BLANKING},
>      {INIT_END_LIST, null, 0)};
>
> It's worth pointing out that the INIT_DELAY and INIT_EXEC commands are 
> not the same!  Though the structure of an INIT_DELAY may appear 
> identical to an INIT_EXEC with a null routine pointer, there is an 
> important distinction that is used for greater control of 
> parallelization.  During an INIT_DELAY call, parallelization is 
> immediately enabled.  However, the INIT_EXEC call does not enable 
> parallelization on the first invocation of the target subroutine.  
> This makes implementing "critical sections" straightforward.  However, 
> when an exec subroutine loops (which occurs when the subroutine 
> returns an INIT_EXEC_RETRY state), parallelization is enabled 
> in-between subsequent calls to the routine.  Therefore, other hardware 
> may simultaneously be initialized while the loop continues to spin.  
> Conversely, when a small delay is necessary and parallelization is 
> undesirable, simply perform an INIT_EXEC with a null subroutine pointer.
>
>
>
>
> Appendix B - Common Questions about Initialization Lists:
>
> Q: Why go to all this work?
> A: It actually isn't difficult to implement the list manager, and once 
> you've become accustomed to the technique, initialization lists are 
> very easy to create and maintain.
>
> Q: Why isn't If/Then/Else functionality directly supported?
> A: Because an initialization list should encode a specific state.  
> Instead of adding an If/Then/Else clause, the list should instead 
> encode the precise outputs to implement the current state.  If the 
> state changes such that different I/Os are necessary, a different list 
> should be encoded.  It's also easy to split init lists into multiple 
> sequential init lists if only a few differences are present in the 
> init lists for different states.
>
> Q: What about size?
> A: The initialization lists are typically much smaller than an 
> equivalent code sequence.  On machines with restricted resources, this 
> can result in significant space savings.
>
> Q: Isn't this slow?
> A: Quite the opposite.  By compactly encoding device states, we avoid 
> calling many different device drivers, checking current device states, 
> etc.  Instead, the list manager blasts through the initialization 
> lists very quickly.  In fact, on many system architectures, the cycle 
> time for I/O writes is sufficiently long that the list manager can 
> often saturate the bus.  Since I/O writes dominate most init lists, 
> the execution time on such machines is nearly optimal.  When 
> parallelization is exploited via careful list optimization, elapsed 
> clock time can be orders of magnitude faster than traditional approaches.
>
> Q: Why not stick with code instead of the data list approach?
> A: As above, using code in the current O.S. structure requires 
> sequentially executing each driver.  No parallelization across devices 
> can occur, leading to unnecessarily slow initialization times.
>
> Q: Are there any gotchas?
> A: Yes.  Using a list manager that is capable of parallelization 
> requires a bit more thought than conventional single-threaded init 
> code.  In particular, you'll want to use caution when multiple devices 
> share a common I/O port, such as an interrupt controller.  Use of 
> Set/Clear instead of Write, and simple planning regarding the 
> insertion of parallelization-enabling Delays or Exec loops are easy 
> ways to prevent problems.
>
> Q: Supporting parallelization is too much work for me.
> A: Try it before you toss it!  Still, if parallelization is absolutely 
> undesirable in a specific application, it's trivial to use the list ID 
> to force sequential execution of each list.  As long as lists don't 
> share the same major ID number, no parallelization will occur.
>
> Q: Where did this stuff come from?
> A: Variants of init lists have been used for decades.  This specific 
> approach has its origins in techniques that were developed by Mark 
> Foster at Zenith Data Systems more than 20 years ago - they were used 
> to significantly improve structure and maintainability in that firm's 
> PC BIOS.
>
> Q: Is this useful for anything other than this specialized power 
> management?
> A: Absolutely!  When integrated into the boot ROM or BIOS, the 
> initialization list approach can significantly speed up the cold boot 
> process - just execute simple diagnostics routines from the init lists 
> when required.  Furthermore, the exact same list manager and data 
> structures may also be used to support fast resume when desired.
>
>
>
> Sincerely,
> Mark J. Foster                         &   Jim Gettys
> VP Engineering/Chief System Architect      VP Software Engineering
> mfoster at laptop.org                         jg at laptop.org
>
>                              One Laptop Per Child 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.laptop.org/pipermail/devel/attachments/20060805/134134f8/attachment.html>


More information about the Devel mailing list