[Testing] [sugar] Automated testing, OLPC, code+screencasts.

Ronak Chokshi rchokshi at marvell.com
Fri Mar 28 01:49:43 EDT 2008


Inline.

Regards,
Ronak

> -----Original Message-----
> From: testing-bounces at lists.laptop.org [mailto:testing-
> bounces at lists.laptop.org] On Behalf Of Michail Bletsas
> Sent: Thursday, March 27, 2008 10:53 AM
> To: bens at alum.mit.edu
> Cc: bens at alum.mit.edu; Titus Brown; testing at laptop.org; testing-
> bounces at lists.laptop.org
> Subject: [Testing] [sugar] Automated testing, OLPC, code+screencasts.
> 
> "Benjamin M. Schwartz" <bmschwar at fas.harvard.edu> wrote on 03/27/2008
> 01:37:16 AM:
> 
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Michail Bletsas wrote:
> > | testing-bounces at lists.laptop.org wrote on 03/26/2008 09:19:19 PM:
> > |
> > |> 2. Many, and perhaps most, of OLPC's remaining difficult bugs are
> > | related
> > |> to the network.  They are most commonly related to the closed
> wireless
> > |> firmware, which is buggy and lacks key features regarding mesh
> routing
> > | and
> > |> multicast.
> > |
> > | Can you qualify your statement?
> > I have seen the wireless hardware silently drop all outgoing packets
but
> > continue to route incoming packets for several minutes, until
forcibly
> > reset by the user (about a month ago).  The firmware is so unstable
that
> > the wireless driver even contains a mechanism to recognize when the
> > firmware has wedged and reset it.  This is what I mean by buggy.
> 
> So according to your thinking everything that has a reset button is
buggy.
> I guess that, technically speaking, you are correct ;-)
> I also tend to believe that a "thick" firmware like the one that we
use on
> the 8388 will always have bugs given that it is several hundred
thousand
> lines of code so I don't feel bad for putting the reset functionality
> there in the first place.
> 
> You are also very quick to point fingers to the firmware for
everything
> that goes wrong with the networking subsystem of the laptop.
> The behavior that you are describing can be explained when the
wireless
> firmware doesn't communicate with the host CPU and is only forwarding
> frames for other mesh nodes. There has also been a major rewrite of
the
> (completely open source) driver in use with the laptop, after which we
> started to see that behavior (which was not observed before the
rewrite).
> It is very easy to point fingers on religious grounds, it is much more
> difficult to fix problems.
> 
> 
> >
> > | What features does it lack when it comes to mesh routing?
> > For me, the #1 missing feature is whitelisted wake-on-multicast.  To
be
> > specific, it should be possible for the firmware to be told which
> > multicast addresses refer to this host.  The firmware would then
wake up
> > the CPU only when a multicast packet arrives with a destination that
is
> on
> > the whitelist.  Without this feature, we are forced to choose
between
> > never waking on multicast, and missing lots of important packets,
and
> > waking up on every single multicast packet, which essentially means
> never
> > sleeping at all.
> 
> First of all, what you are describing is standard WOL behavior (Wakeup
on
> LAN) which was not present in the original spec of the mesh firmware
in
> favor of the more general wakeup on broadcast, mcast or unicast.
Marvell
> is working on adding that in.
> So, no bug here, just oversight on our part which is going to be
remedied.
> 
> 
> Even with that support in place, we will still be "missing lots of
> important packets" unless we decide to wakeup on every multicast
frame. So
> a more specific filter is required because you don't want to wakeup on
> Avahi announcements but you do want to wakeup on traffic from
activities
> that you already participate. You can do that on the application
level, by
> stopping the Avahi listener before you suspend, however that will add
a
> lot of time to suspend and resume.
> 
> 


[Ronak] we are introducing a filtered mechanism in the firmware that
will allow the driver to program a handful of multicast address and
hence will allow the device to wake-up the host on some (and not all) of
the multicast addresses.


> 
> >
> > My #2 missing feature is a control for transmit gain and receive
gain.
> By
> > decreasing gain, the range of each transmission could be reduced,
> turning
> > dense meshes in a single classroom into multihop meshes.  This might
> > compensate somewhat for the firmware's simplistic multicast routing.
> It's
> > not clear that this would work, but at present we cannot even try
it.
> 
> Why would you ever want to turn a classroom into a multihop mesh?
> Just because you have a hammer, doesn't turn everything into a nail.
> That is exactly the approach that has created all the unrealistic
> expectations about what the mesh can and cannot do.
> If you are in a classroom, an AP will always be a lot more efficient
since
> it doesn't have to do with the mesh control plane traffic.
> 
> As far as the support for transmit gain and receive gain is concerned,
> transmit power control is definitely supported and the firmware even
> supports per frame tx power setting. The D/A on the power amplifier
used
> on the XO's module is not fast enough for that to work, so one has to
> settle for coarser grain control. The bottom line is that this is a
> hardware limitation, not software.
> 
> I don't really understand what receive gain adjustment will buy you in
a
> dense scenario. One of the fundamental issues with WiFi radios in
general,
> is that interference range is much larger than decode range. What you
can
> play with is the clear channel assessment threshold, however that is
> different from receiver gain (usually done via an AGC in the analog
> domain).
> 
> 
> >
> > Smart multicast routing is the other obvious missing feature; I
> appreciate
> > that this is still considered an academic research problem.
> It is and the 802.11s standards committee is also struggling with it.
> 
> >
> > | Can you point me to a better working implementation out there when
it
> > | comes to multicast routing?
> > No, I cannot.
> >
> > This wireless firmware may be the best mesh implementation in all of
> > history, in the whole world.  It's still disgustingly buggy, and has
> > already set the project back months.  


[Ronak] Not sure how far this is actually true. But we have been through
this in an earlier email thread and I don't feel the need to re-iterate
the explanations again here.


Its multicast and wakeup behaviors
> > have forced us to drop critical features.  The software team has
come to
> > regard the wireless system as so unpredictable that any task
involving
> it
> > is "science, not programming".
> 
> Just looking at the number of bugs in the trac contradicts your
statement.
> And yes, the wireless subsystem does many things that existing radios
> don't do.
> It is also asked to do "magic" as opposed to what physics
realistically
> allow.
> That's the main bug with it right now. It just doesn't make spectrum
out
> of thin air...
> 
> 
> >
> > I am also quite convinced that if OLPC developers were free to read
the
> > source code and modify it, given access to Marvell's internal
> > documentation, we would be much further along.
> 
> That is generally true. It runs against long established practice in
the
> wireless industry that is enforced by some valid and some not so valid
> reasons.  Unfortunately, there is no example in the industry right now
of
> an open fully-functional low-level wireless stack and that will take
some
> time to change. If the XO ends up being produced in really high
volumes,
> then we will definitely revisit that. The bottom line is that right
now
> the volumes of the devices that require their radios to be "closed"
are
> much higher than those of the open source devices.
> 
> 
> 
> >
> > |> 3. Almost all of OLPC's major bugs are Heisenbugs.  They often
don't
> > |> appear at all with only one laptop, and appear rarely until one
has
> 12
> > | or
> > |> more laptops sharing a wireless mesh.
> > |
> > | And most of them are due to the fact that our application traffic
> > | saturates the wireless spectrum.
> >
> > Indeed.  And that is due to a mismatch between Salut, which assumes
> > efficient multicast routing, and the firmware, which doesn't provide
it.
> > I know very little about the ongoing work with Cerebro, but that
seems
> to
> > be a very reasonable next step.
> 
> Yes, it is.
> 
> 
> M.
> 
> 
> _______________________________________________
> Testing mailing list
> Testing at lists.laptop.org
> http://lists.laptop.org/listinfo/testing


More information about the Testing mailing list