Automated testing of activities

Wed Jul 18 11:20:15 EDT 2007

On Wed, 2007-07-18 at 06:35 -0700, Kent Quirk wrote:
> Please don't read this as me objecting to the concepts of automated 
> testing or accessibility support -- I'm generally in favor of both. But 
> they can have pretty major implications, especially on schedules.

Yup.  In this case, I'm trying to help people just *begin* to think
about what we need in the longer term for both testing and
accessibility.  This thread is not a "change course right now" request
at all, and I don't think will affect us much between now and initial
deployment.  It's more a reaction to what we're learning as we have
started to get into serious power management measurement recently, and
where we should be headed long term.

> 
> Many questions:
> 
> When you say "accessibility" you mean "support for vision-impaired 
> users"? 

Yes, among others: the usual Federal requirements.  Note that other
governments often follow the US lead in this area.  For those of you not
aware of them, they are "sane" regulations (or they would get ignored);
sometimes the right answer is for impaired people to be using different
hardware and or additional hardware and software assist than unimpaired
people.  They don't say arbitrarily impaired people have to be able to
use all applications on any and all computers of the world.

> Large fonts? Zoomable screen? Will there be a screen reader
application 
> for the system? Where is it coming from, and how will it interact with 
> activities?

Note that there is support latent in most of our technology (but not in
the OLPC recent additions), for accessibility (e.g. ATK).  Those hooks
will aid our testing as well. Here's where lots of it will come from.
http://developer.gnome.org/projects/gap/

We need to clean up some of our work in the toolkit widget area.  There
is some other infrastructure work that needs doing, by someone, or
eventually by us if it doesn't happen in the meanwhile.  These changes
will usually be transparent to most developers.  In some technological
ways ways, we're in better shape than most Windows systems.

> Are there other accessibility requirements?
> What in particular are the requirements? Support for a screen reader? 

Screen reading technology is certainly deployed/deployable in Linux
desktops.  We even have an open source synthesizer in our builds (I
think; I know I asked J5, but haven't verified its there...)

> How does the accessibility interact with the intent to support 
> localizable or localization-free activities, in large part by leaving 
> text out of the interface entirely? 
> What is a textless application 
> supposed to do in this environment?

This only makes it easier.

> What parts of the system are going to have to comply with this 
> requirement? The mesh view, for example? The clipboard? What about, say, 
> drawing activities? The record application? 

Clearly the key activities and Sugar itself are the most important.

> Games?
> 

All of these are a continuum.  The way accessibility requirements are
written is that they are not inflexible creatures.

> Is automated testing intended for more than just battery life testing? 

I'd hardly say "just battery life time".  Battery life time, where we
are going, is *really* important.  I'd like everyone to reset their
thinking on this topic: most of the world's kids *don't* have reliable
electricity.  This is the difference between learning, and less learning
for them, or less time for kids to use your wonderful creations!

We are *not* in the typical situation of conventional systems where once
an activity runs "fast enough" we can stop working on performance.
Making our software work more efficiently by using fewer cycles due to
optimization or taking opportunities to suspend and save yet more joules
can translate *directly* to how much time a child has to learn versus
wasting time generating power, or costs of electricity infrastructure.

Our hardware power use is much more directly tied to your application's
behavior than conventional system have been too: when running, you can
easily make a factor of two difference in power usage between idling,
and full CPU usage.  And the difference between running and being
suspended with the DCON on makes the difference in efficiency for many
applications much more than this.

If we can't measure our energy usage in a realistic fashion, we can't
track progress.in this performance area: we can't rely on activity
developer's typical behavior to do this optimization; it is too complex
to do without measurement, and you can't instrument individual machines
(though we now have one highly instrumented machine to make the
measurements on).  So we have to do this centrally (the tinderbox will
be integrated with power measurement hardware we have).

It will also help smoke testing of applications between builds catching
a fair number of the common regressions.

> If not, is it really necessary for every activity to support it? If so, 
> what do you expect to accomplish? Will it actually save more than the 
> amount of time taken to implement it for a given activity?

Probably not: but we'll care about core applications.  I don't care
(much) if blockparty breaks, nor are arbitrary games representative of a
work load in school (maybe out of it though ;-)). 

> 
> What are the time constraints?

Little in the very short term; so long as we have a single resolution
screen testing is easier; with time, as we have to worry about multiple
platforms and need to improve our work further.

> 
> The potential scope is huge...it would be nice to understand the actual 
> requirements.

Much of what needs to be done is infrastructure work, rather than work
activity developers themselves will be doing.  Eben will be adding the
"basic hygiene" to the HIG for accessibility.  Most of it is common
sense.  We're working on the power measurement infrastructure and
integration with Tinderbox.

My note is to make people aware that most of this work is useful for
several purposes, one of which is vital for most kids, the other for
some kids. 
                                 - Jim

> 
>    Kent
> 
> 
> 
> Jim Gettys wrote:
> > I want to give everyone a heads up to start thinking about the following
> > issues:
> >
> > 1) we need to be able to do basic automated smoke testing of activities.
> > 2) we need to support accessibility in activities.  Some of you may not
> > be aware of it, but this is a non-optional requirement of some
> > jurisdiction, most of which roughly follow US government laws in this
> > area.
> > 3) we need to be able to quantify improvements/regressions in activity's
> > energy usage, as we optimize various code in our system.
> > 4) we need to be able to automate testing of realistic workloads (or
> > play loads :-), of our systems that roughly simulate the use of a child
> > in school, so we can see how we're doing when we change various knobs
> > that we have for controlling power usage, from backlight, to use of the
> > dcon, to blanking the screen, to suspending aggressively, etc.
> > Applications adding hints in key locations that suspending might be a
> > good thing to do are also becoming possible, as our power management
> > infrastructure improves.
> >
> > But if we can't reproduce results, we'll be in fog, unable to see what
> > direction to go.
> >
> > We'll therefore need to be able to script applications.  So long as
> > we're on an XO-1 with its resolution screen *and* you don't change the
> > UI, it's not all that hard.  But we expect all of you to want to tune
> > your UI's, and also we need to ensure accessibility needs get met.
> > Future systems our software runs on may have different screens; this
> > model will break down pretty quickly. 
> >
> > Note that the hooks required for accessibility hooks makes it possible
> > to script activities by name, rather than by X/Y coordinates on the
> > screen, and wait for results, and that this technology therefore can
> > remove the screen resolution dependence of such scripting.  Custom
> > widgets you build will need such hooks, in any case.
> >
> > We'll shortly be setting up a battery life testing infrastructure in a
> > revived Tinderbox; with the machine we have with instrumented with > 20
> > measurement points we can gather great amounts of useful information on
> > the behavior of our system and of individual activities.
> >
> > At some point, we'll start asking you to generate a workload for an
> > activity, which should be able to address many of the issues above.
> > More when the infrastructure work is further along.
> >
> >                                - Jim
> >
> >   
> 
> 
-- 
Jim Gettys
One Laptop Per Child