Data Storage and User-facing System Requirements [was Re: [sugar] 9.1 Proposal: Files]

Thu Oct 30 14:26:10 EDT 2008

Thank you to the repliers to my original proposal.  I missed a crucial
point--- which is that I must clearly draw the lines of connection between
user-facing requirements and the solutions provided by my proposal.  I
am rewriting it with this pattern in mind.  If you wish to reply on the
mailing list, please write *one* email per point, lest things get too
crazy.  (I will post on the OLPC wiki shortly and reply with a link.)

Proposal:

Allow activities which run under Sugar to write regular POSIX files.
Where sensible and possible with respect to user-facing requirements,
restructure our systems to expect them to behave as such.

I am basing this proposal off the idea that the system has at very least
the following requirements.  The system (Sugar on the XO) should:

    1) Be Bug-free, stable
    2) Allow users to save data
    3) Allow users to find data later
    4) Allow users to share data (collaborate)
    5) Afford users wide choice in the applications they can run
    6) Allow users to modify the system

It seems from my reading of mailing lists, IRC logs, and listening to
conversations with people that we are trying to resolve all of these
issues by implementing more code to get around difficulties imposed by
our current data storage implementation and security model.

My argument is that we can do less work and get an improved result from
the user's perspective by removing the layers of code (datastore and
security restrictions) which prevent applications from behaving as they
normally do on other systems.  For practical reasons this has immediate
rammifications for the requirements 1-5 listed above.  I will now
discuss the use of file with respect to these requirements:

1) Be bug-free, stable:

Every line of new code written must be learned (both in effect and in
document) by everyone who wants to work within the framework it creates.
Provided this, it is preferable to write less code on our own, and share
existing solutions and design patterns with the larger open source
software development community.  I base this in the commonly accepted
FLOSS theory that, provided feedback systems which connect users and
developers, bugs are a function of users, time, code complexity, and
changes to the codebase over time. Roughly:

     complexity * changes
     --------------------   ==  number of bugs
     users * time in use

Following this theory, writing our own systems decreases (users * time
in use) and increases (complexity * changes), thus increasing the number
of bugs.  Using an existing system applies roughly the same bug function
found in that system to our case, provided our specific use doesn't
markedly increase complexity in code paths.

2) Allow users to save data:

Our users have potentially unique sub-requirements on this point.  I
have gleaned the following from listening to discussions on olpc-related
mailing lists and in IRC logs.  The system must:

  a. Automatically save data.

  b. Encourage use of names to make it easier to find things.

  c. Don't require names for things for which naming is tedious or
    unnecessary, such as photos.

Currently we meet all these features, but the implementation which we
use to do so has caused issues for users in some regards:

  a. Automatically saving everything confuses users by mixing things
which they find important and things which they don't care about (the
'journal spam' problem).
  b. We don't encourage naming strongly enough, even though doing so is
incredibly simple in the current UI.  This confuses users when they try
to find things later.
  c. We do this very well, but the lack of distinction in how we display
items in the Journal has caused issues for users (e.g. it is necessary
to click on photos to see previews of them instead of having the photos
visible in the top-level view).

Generally I have heard that most people thing we need to resolve (a) and
(b) by more strongly encouraging naming.  (c) is little discussed, but
is something that 'traditional' file browsers such as Nautilus seem to
do quite well.

My impression is that if we switch to a 'stronger enforcement of naming'
scheme we may resolve (a) and (b).  But also I note that switching to a
stronger enforcement of naming brings us much closer to the 'save named
files' pattern under which non-sugar applications function.

3) Allow users to find data later

Sugar's Journal is built around the idea that it should be easy for
users to find data they have produced.  The exposition of this
requirement would be:

  a. Provide application(s) which allow for the browsing and location of 
    past work in unique ways (search, tagging, metadata).  The current
    Sugar Journal is an example.  Other similar examples are Beagle, 
    Tracker, and Pinot.

To help users find data later we must provide a way to browse all data
creation events on the system.  This implies a need to index all data
events.  We have chosen a design in which we log all user data events
by providing a single point of entry for these events (the datastore
provides filenames for all new files).  This makes it easy to log user
data-generation activity in userspace.  Desktop search systems
(Beagle, Tracker, Pinot) use inotify to achieve this result.  They do
not require a single point of entry to the filesystem in userspace to
enable logging because they integrate with upstream kernel mechanisms
which do exactly that.

Provided we are able to log and process filesystem events there need
be no difference in functionality in this regard whether we use files
selected by the user, applications, or a custom userspace system such
as the datastore.  It should be noted that there are complexity issues
with setting large numbers of inotify watches, for instance, and
because of performance and implementation reasons there is not a
simple stream of filesystem events provided by the Linux kernel.
These concerns appear to have influenced the current datastore design.
It should be noted that such issues exist for many applications which
have equivalent requirements to the Journal, and we might well serve
all of them by working on the upstream filesystem change tracking
issue.

4) Allow users to share data:

Data sharing is a fundamental requirement of a collaborative computing
environment.  In 8.2 the majority of collaboration is synchronous
(realtime).  We use a custom set of services, APIs, and applications to
enable this collaboration.  Asynchronous collaboration, however, is
poorly supported on the system outside of internet services available to
all computers running a web client.

One property of the XO is that all machines automatically associate to
form a mesh network.  As such they all potentially have access to
services running on other machines.

Provided we use files, a simple method to meet this requirement would be
to run a webserver on every XO which hosted a directory listing of the
user's default save directory (their home directory, for instance).
What is required is that users give names to the files they want to
share, so that they can sensibly be integrated into the POSIX
filesystem.  Files that the user did not want to name or share could
continue to have human-illegible names on the 'underlying' filesystem.

This is a feature which could be provided using a wide number of
potential webservers.  All that is required is that the user saves files
with names intelligible to them and the other users with whom they wish
to share.  Using webserver directory listings provides us with an ample
selection of potential applications to provide this functionality.

5) Afford users wide choice in the applications they can run:

This requirement is one that I have frequently heard from internal
employees, deployment employees, teachers, and learning team members.  I
am certainly paraphrasing their requests and observations, but the
general sentiment is clear to any computer user.  One wants to be able
to use their general purpose computing system for as wide a variety of
applications as possible.  It is the applications which give value to
any computing system--- my XO wouldn't have any value to me at all if
the only thing I could run was Calculate.  I would be as happy with a $2
solar-powered calculator!

In my opinion 8.2 does not meet this requirement, as it is extremely
difficult to run what have been termed 'legacy' applications--- or
applications which are not specifically designed for use in the Sugar
environment.  In order to operate completely within the unique
environment of Sugar, Linux applications must be modified to understand
our unique environment.  The following departures from a default Linux
environment cause these issues and necessitate the porting of these
applications:

    - Datastore/Journal integration
    - Activity isolation (security system)

It is unclear to me which one of these ideas is more central in the
current design.  Integrating an application to use the datastore, and
thus the Journal is important to our current design because we have
chosen to produce an user-activity-indexer completely in userspace, and
thus require all persistent data creation events to be routed through a
single custom API.  But this design dovetails nicely with the security
model, which distrusts applications that the user has run, and forces
their creation of data to be mediated by the 'trusted' datastore
application.

What is clear is that neither of these points of departure from a
typical Linux distribution simplifies the process of getting more
applications to run on the XO.  The unique hardware environment (small,
high-res screen, keyboard, problematic touchpad) impose enough problems
on developers who wish to port applications to work well on the XO.  It
is certain that more work is required to get applications to work on the
XO because of these points of failure.

Dropping these environmental restrictions, it could be possible for
applications on the XO to write files as they do on other systems.  This
would make it simple for them to run and widen the choice that users
currently have in what software they run.  It should still be possible
to improve the integration of a given activity with the XO environment
by working with APIs which we provide (such as: to add better metadata
to our Journal, or to integrate with Sugar-specific realtime
collaboration systems).  But removing these points of incompatibility
should make it also possible to just run 'legacy' applications.

6) Allow users to modify the system:

One of the often-stated design goals of Sugar was to build a system
which would be configurable and hackable by its users.  This goal has
justified a number of crucial design decisions, such as the choice of
python as a system programming language for the Sugar UI and its
associated Activity APIs.

A mismatch exists between the data view which users have access to
from within Sugar (log of activity usage) and that which defines the
underlying system (hierarchical filesystem).  On 8.2 users have the
capability to modify the system but may only browse and edit files on
the system by using the Terminal application (which is permitted to do
so by the security implementation).  This doesn't encourage users to
understand the hierarchical and file-based nature of the underlying
system.

Overall, I think it eminently possible to restructure the OLPC
software stack to use files without causing a failure to meet any of
the above requirements.  Particularly on requirements 1, 4, 5, and 6
it appears that the resulting system would better meet the
requirements of our users than the current implementation.  We should
expect no significant best-case change in 2 and 3, but it may be that
by using files we can improve the user-perceived stability and
prectiability of the system and thus better meet these requirements as
well.

In all cases it appears that less work would be required by OLPC
developers, volunteers, and users to maintain and use such system
within a dispersed open source software environement.  This is the
crux of my argument for the use of files within the XO software stack.
We are limited in our capacity to write and maintain code, and as such
should seek ways to distribute or share the costs of maintaining our
software stack outside of OLPC, Sugarlabs, and our internal
development processes.  While we may be able to sustain a custom
approach to data storage and access, doing so will certainly crowd out
contribution to our system and isolate us going forward.  I see no way
in which this will improve the quality of experience we give to our
users.

Erik