[Olpc-sysadmin] Volunteer Infrastructure Group Thoughts and Priorities

Ed McNierney ed at laptop.org
Wed Jul 29 07:00:33 EDT 2009


Folks -

Since the VIG was created to assist OLPC sysadmin staff, and since  
OLPC now has no dedicated sysadmin staff (Chris Ball and I are  
covering the work part-time) I think it's probably a good time to  
rethink and recalibrate how the VIG operates.  Here are my thoughts on  
what we could do, and I'd appreciate hearing other comments and input.

1. OLPC Sysadmin Priorities - Since Chris and I are now the people  
primarily responsible for sysadmin duties, I'd like to make sure  
everyone understands what our top priorities are.

a. By far the most serious issue we're facing is the extreme load  
placed on pedal.laptop.org for wiki access.  We have reconfigured and  
tuned the squid proxy on weka and it seems to be working very well.   
But the uncached requests, especially during the school day in  
Uruguay, produce both a very heavy load on pedal.laptop.org and on  
crank.laptop.org due to the large number of Activities linked to by  
the wiki that are hosted on crank.laptop.org.  Since pedal runs the  
apache2 server for the wiki, the MediaWiki code, and the MySQL  
database it uses, it gets overloaded.  And since that same machine  
hosts our postfix and mailman services, it's even worse.  When  
Uruguayan schoolchildren arrive in the morning, OLPC has difficulty  
sending mail, and that's a real problem.  The poor performance of the  
wiki and crank affects a very wide community of users.

b. We are managing far too many physical and virtual machines.  We are  
not a very large organization, but yet we have literally a few dozen  
servers and an untold number of VMs running on them.  We cannot manage  
such a large set of systems.  I don't mean that from a "we need more  
VIG help to manage them" perspective, but rather because our risks of  
failure and the workload involved to keep these machines running and  
secure is much, much larger than it needs to be.

c. Our network and systems documentation is in very poor shape.  The  
chief value of this documentation is so that new people coming in to  
the task (like cjb and me) can look in one place and get all the  
important information they need about each machine.  That can't be  
done right now, and we have quite a bit of information that's either  
confusing, inaccurate, or missing.  This is a very serious hindrance  
to any cleanup/consolidation work for priority b. above.

d. Systems support for the XO-1.5 effort is a relatively small amount  
of work, but a high priority.  This mainly consists of setting up  
build systems for OS and firmware releases, and for the F11-on-XO-1  
effort.

2. VIG Priorities - From what I can tell from the VIG meeting logs,  
the current set of VIG priorities are a completely separate set from  
the ones I listed above.  I think that tells us two main things: the  
VIG is currently serving a set of needs for the broader OLPC volunteer  
community, and that the VIG's interests have moved away from the  
original focus of supporting OLPC sysadmin work.

3. Short-Term OLPC Sysadmin Action Items - We are currently overloaded  
yet have underutilized hardware.

a. Cjb and I successfully moved weka.laptop.org from 1CC to W91 a few  
weeks ago, completing a long-standing sysadmin priority to help get  
all "public-facing" systems and services out of the 1CC server room  
and into the much better colo environment of W91.  My next priority is  
to reconfigure owl.laptop.org (currently almost unused) to be tuned as  
a MySQL database server system with a RAID 1+0 hard disk  
configuration.  This will let us rebuild owl as a semi-dedicated or  
dedicated MySQL server for all our needs, but particularly for the  
wiki.  By doing that we will take the first step towards better wiki  
load-balancing by splitting the MySQL service away from pedal.  I'm  
also eager to set up Atrium, an open-source project management tool  
that uses MySQL, so getting the database server configured properly  
will allow me to use it right away and not migrate later.

b. Account review and security. - cjb and I should have root access on  
every machine OLPC has, and we need to clean up old accounts scattered  
everywhere (both sudoers and others).  What we have now is a huge  
mess, with long-gone employees and volunteers still having complete  
access to our systems.  We have recently had problems with  
unauthorized usage and content on our machines, incurring well- 
deserved complaints from our hosts at the Media Lab and MIT.  This  
lack of control and security puts our very valuable hosting  
relationship at risk and we need to take more responsibility for it.   
I will be continuing to consolidate and eliminate unused and  
unnecessary accounts.  As much as possible, I'm happy to continue to  
offer hosting services to the OLPC volunteer community but (at a  
minimum) we need to do that on a much smaller number of machines so we  
can pay attention to them.

4. Suggested VIG Transition - It seems to me that the combination of  
3(b) and 2 above suggests an opportunity to more clearly focus the  
VIG's efforts on the wider OLPC volunteer community, and to do so in a  
more structured manner.  This would suggest something like:

a. The designation of some small number of physical servers to be  
allocated to VIG management for VIG-initiated and VIG-coordinated  
projects and services.  These machines would be (as much as practical)  
under the management of the VIG to support the broader OLPC community,  
and Chris and I would pay little attention to them.

b. The VIG would more clearly not be responsible for "internal" OLPC  
systems management tasks, such as our public wiki, mailing services,  
MySQL configurations, build machine hosting, etc.  As far as I can  
tell this is really a formalization of a change that's already  
happened, since there's been very little VIG activity on these topics  
lately.

c. We would create a formal relationship between the VIG and the  
senior OLPC staff member responsible for systems and services  
(currently me) .  That might be the secretary of the VIG or some new  
"coordinator" role - I would leave that process up to the VIG  
community to decide.  These two people would be responsible for all  
VIG-OLPC communication and coordination.  If we successfully decouple  
the two systems management efforts (internal "OLPC" and external  
"VIG") then we should be able to narrow the need for communication so  
that two people can handle it easily.

Summary - I would like to help the VIG succeed better in the OLPC  
world of 2009 and beyond, and I need to be more comfortable that  
OLPC's internal systems management is going in the direction I need,  
while minimizing the risk that well-intentioned people end up stepping  
on each others' toes and breaking things (or working at cross- 
purposes).  I'm very open to tweaks to the above suggestions or  
completely new suggestions.  Please let me know what you think, and  
thanks very much for your past and future help!

	- Ed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.laptop.org/pipermail/olpc-sysadmin/attachments/20090729/897fc0f9/attachment.html 


More information about the Olpc-sysadmin mailing list