[Olpc-sysadmin] Volunteer Infrastructure Group Thoughts and Priorities
Ed McNierney
ed at laptop.org
Wed Jul 29 07:00:33 EDT 2009
Folks -
Since the VIG was created to assist OLPC sysadmin staff, and since
OLPC now has no dedicated sysadmin staff (Chris Ball and I are
covering the work part-time) I think it's probably a good time to
rethink and recalibrate how the VIG operates. Here are my thoughts on
what we could do, and I'd appreciate hearing other comments and input.
1. OLPC Sysadmin Priorities - Since Chris and I are now the people
primarily responsible for sysadmin duties, I'd like to make sure
everyone understands what our top priorities are.
a. By far the most serious issue we're facing is the extreme load
placed on pedal.laptop.org for wiki access. We have reconfigured and
tuned the squid proxy on weka and it seems to be working very well.
But the uncached requests, especially during the school day in
Uruguay, produce both a very heavy load on pedal.laptop.org and on
crank.laptop.org due to the large number of Activities linked to by
the wiki that are hosted on crank.laptop.org. Since pedal runs the
apache2 server for the wiki, the MediaWiki code, and the MySQL
database it uses, it gets overloaded. And since that same machine
hosts our postfix and mailman services, it's even worse. When
Uruguayan schoolchildren arrive in the morning, OLPC has difficulty
sending mail, and that's a real problem. The poor performance of the
wiki and crank affects a very wide community of users.
b. We are managing far too many physical and virtual machines. We are
not a very large organization, but yet we have literally a few dozen
servers and an untold number of VMs running on them. We cannot manage
such a large set of systems. I don't mean that from a "we need more
VIG help to manage them" perspective, but rather because our risks of
failure and the workload involved to keep these machines running and
secure is much, much larger than it needs to be.
c. Our network and systems documentation is in very poor shape. The
chief value of this documentation is so that new people coming in to
the task (like cjb and me) can look in one place and get all the
important information they need about each machine. That can't be
done right now, and we have quite a bit of information that's either
confusing, inaccurate, or missing. This is a very serious hindrance
to any cleanup/consolidation work for priority b. above.
d. Systems support for the XO-1.5 effort is a relatively small amount
of work, but a high priority. This mainly consists of setting up
build systems for OS and firmware releases, and for the F11-on-XO-1
effort.
2. VIG Priorities - From what I can tell from the VIG meeting logs,
the current set of VIG priorities are a completely separate set from
the ones I listed above. I think that tells us two main things: the
VIG is currently serving a set of needs for the broader OLPC volunteer
community, and that the VIG's interests have moved away from the
original focus of supporting OLPC sysadmin work.
3. Short-Term OLPC Sysadmin Action Items - We are currently overloaded
yet have underutilized hardware.
a. Cjb and I successfully moved weka.laptop.org from 1CC to W91 a few
weeks ago, completing a long-standing sysadmin priority to help get
all "public-facing" systems and services out of the 1CC server room
and into the much better colo environment of W91. My next priority is
to reconfigure owl.laptop.org (currently almost unused) to be tuned as
a MySQL database server system with a RAID 1+0 hard disk
configuration. This will let us rebuild owl as a semi-dedicated or
dedicated MySQL server for all our needs, but particularly for the
wiki. By doing that we will take the first step towards better wiki
load-balancing by splitting the MySQL service away from pedal. I'm
also eager to set up Atrium, an open-source project management tool
that uses MySQL, so getting the database server configured properly
will allow me to use it right away and not migrate later.
b. Account review and security. - cjb and I should have root access on
every machine OLPC has, and we need to clean up old accounts scattered
everywhere (both sudoers and others). What we have now is a huge
mess, with long-gone employees and volunteers still having complete
access to our systems. We have recently had problems with
unauthorized usage and content on our machines, incurring well-
deserved complaints from our hosts at the Media Lab and MIT. This
lack of control and security puts our very valuable hosting
relationship at risk and we need to take more responsibility for it.
I will be continuing to consolidate and eliminate unused and
unnecessary accounts. As much as possible, I'm happy to continue to
offer hosting services to the OLPC volunteer community but (at a
minimum) we need to do that on a much smaller number of machines so we
can pay attention to them.
4. Suggested VIG Transition - It seems to me that the combination of
3(b) and 2 above suggests an opportunity to more clearly focus the
VIG's efforts on the wider OLPC volunteer community, and to do so in a
more structured manner. This would suggest something like:
a. The designation of some small number of physical servers to be
allocated to VIG management for VIG-initiated and VIG-coordinated
projects and services. These machines would be (as much as practical)
under the management of the VIG to support the broader OLPC community,
and Chris and I would pay little attention to them.
b. The VIG would more clearly not be responsible for "internal" OLPC
systems management tasks, such as our public wiki, mailing services,
MySQL configurations, build machine hosting, etc. As far as I can
tell this is really a formalization of a change that's already
happened, since there's been very little VIG activity on these topics
lately.
c. We would create a formal relationship between the VIG and the
senior OLPC staff member responsible for systems and services
(currently me) . That might be the secretary of the VIG or some new
"coordinator" role - I would leave that process up to the VIG
community to decide. These two people would be responsible for all
VIG-OLPC communication and coordination. If we successfully decouple
the two systems management efforts (internal "OLPC" and external
"VIG") then we should be able to narrow the need for communication so
that two people can handle it easily.
Summary - I would like to help the VIG succeed better in the OLPC
world of 2009 and beyond, and I need to be more comfortable that
OLPC's internal systems management is going in the direction I need,
while minimizing the risk that well-intentioned people end up stepping
on each others' toes and breaking things (or working at cross-
purposes). I'm very open to tweaks to the above suggestions or
completely new suggestions. Please let me know what you think, and
thanks very much for your past and future help!
- Ed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.laptop.org/pipermail/olpc-sysadmin/attachments/20090729/897fc0f9/attachment.html
More information about the Olpc-sysadmin
mailing list