Salut and Suspend/Resume issues

Ricardo Carrano carrano at ricardocarrano.com
Wed Feb 20 07:33:45 EST 2008


John,

I believe what is discussion here is the choice between waking on multicast
(and then keep MDNS working) or don't wake on multicast (and then saving
more power). Or if there is a way out of this compromise.

Are you cutting to the chase or just changing the movie? (Forgive me for the
joke it is just to calm things down, as you suggested). Could you please
point out what application or use case the issues you raised will disrupt or
hurt? Are these issues related to the current discussion or do they deserve
a new thread?

On Feb 20, 2008 7:42 AM, John Gilmore <gnu at toad.com> wrote:

> OK, children of the world, please calm down.  There are a few too many
> bugs and egos flaring up to come to a reasonable resolution.  This is
> an interdisciplinary problem that crosses too many architectural
> boundaries for any of us to be comfortable seeing the whole picture.
>
> I filed a bug report about the network failing to wake us on multicast
> four months ago (#4616).  A key response by dmwm2 a month ago provides
> a path forward:  http://dev.laptop.org/ticket/4616#comment:20 .
>
> Let me cut to the chase.
>
> Many things are likely to work, if update.1 turns on "wake on
> multicast" using the command "ethtool -s eth0 wol um", AND THE MESH IS
> NOT IN USE:
>
>  *  The laptop will suspend much of the time.
>  *  If someone sends it a multicast, that it is listening for, it will
>     wake up and respond to the traffic (possibly dropping one packet).
>  *  Random multicast traffic that the laptop isn't listening for will
>     NOT wake it up.
>
> I hope that the people responsible for Presence and Sharing can
> test this, and make sure their protocols work with this "wol" setting.
> I don't know that stuff at all.  I'm not even sure what protocols
> are running in my laptops.  They have no school server.
>
> There are three bugs in update.1-691 around this:
>
>  *  The packet that awakens us doesn't get responded to; it was probably
>     dropped, rather than passed to the kernel.  Assuming the protocols
>     retry within 60 seconds, we'll see and respond to the second one.
>
>  *  When the laptop is manually suspended (physically closed), it
>     should not awaken for any reason except being reopened.  Instead,
>     it awakens for each received multicast packet that it is
>     listening for, and then goes immediately back to sleep.  This is
>     a power consumption bug.  I'd say ship the release and live with
>     it.
>
>  *  Receiving these multicasts while closed did also trigger
>     the laptop to refuse to stay resumed when I reopened it.  I had
>     to hit the power button to get it to stay on.  If cjb can
>     reproduce this reliably, he can fix it.  It happened twice
>     for me.  Merely closing and opening didn't fail, but closing,
>     sending a wakeup ping, then opening, did fail.
>
> All of the above works WHEN USING AN ACCESS POINT.
>
> There are several bugs in the mesh that prevent this from working
> over the mesh.  I recommend moving existing school deployments to
> access points, until we get the bugs out of the mesh.
>
> Details follow.
>
> There appear to be more than one bug in the mesh around multicast.  No
> wonder people are confused.  Using the same setup as in #4616, but
> *without* suspending, in update.1-691 I can't get multicast packets
> through reliably.  Setup:
>
>  * Two XO's, MP G1G1s.  One is using build 656, the other update.1-691.
>
>  * In NetworkManager screen, put both on "Mesh Network 1".  Wait a
>    few minutes for things to settle down.  Go to donut screen, make
>    sure both of them say "Mesh Network 1, Connected to a Simple
>    Mesh".
>
>  * Start a terminal on each laptop.  Become root.
>
>  * "ping6 -I msh0 ff02::1" on each laptop.
>
>  * This will ping the all-nodes multicast address.  The laptop that
>    sends this should get back a unicast IPv6 ping response from each
>    node on the network.  Keep moving the mouse on the update.1-691
>    laptop to avoid suspending.
>
>  * On each laptop, it can see itself (btw, ping6 prints its own
>    address on its first line of output).  It prints a very low
>    latency response (e.g. 0.154 ms) packet from its own kernel.  It
>    seldom or never sees a ping response from the other laptop.
>
>  * Bizarrely, every once in a while, the Build 656 laptop will see
>    ping responses from the update.1-691 laptop.  For about 10 seconds.
>    Then they will go away again.  They say "(DUP!)" because it's the
>    second response packet from a single outgoing ping packet.  Perhaps
>    these happen after it suspends and I resume it with mouse motion.
>
> If I stop the pings, go back into NetworkManager, and associate both
> XO's with a local access point (TrendNET TEW432-BRP), and replace
> "msh0" with "eth0", the test works.  The access point is doing NAT, so
> the only nodes on the network are wireless.  Oddly, for some reason,
> each machine sees TWO packets come back from the other machine (sample
> times: 5.51 ms and 6.37ms).  This is not a violation of the IP protocols
> --
> datagrams are free to get replicated -- but it looks like a bug in
> either the Libertas or our kernel.  So I've found two bugs so far,
> and it's only by running simple commands and knowing what to expect.
>
> Now back to what I really wanted to test:  whether the driver support
> for wake-on-multicast works, and whether it only wakes up when the
> multicast packets match the filter.  See the month-old comment in #4616,
> http://dev.laptop.org/ticket/4616#comment:20 .  So, using the access
> point setup as above, I run:
>
>  *  "ethtool -s eth0 wol um" on the update.1-691 laptop.
>
>  *  I sit and wait for it to suspend.  Detected by power LED off and
>     occasionally blinking.
>
>  *  Now on the Build 656 machine, I run "ping6 -I eth0 ff02::2".  Note
>     the final "2", not a "1".  This pings the "all-routers" address
>     on the link local network.  I'm expecting no answering packets,
>     because there are no IPv6 routers on the local wireless LAN.
>     Indeed, not only do I get no answers, but the update.1-691 laptop
>     remains blissfully suspended.
>
>  *  I interrupt that and run "ping6 -I eth0 ff02::1", pinging the
>     "all-nodes" link local address.  This immediately wakes the
>     update.1-691 laptop out of suspend, and I get pings back from
>     both laptops.  The first packet is dropped by the suspended
>     machine, but I get three response packets, from the second ping
> onward:
>     one from the local machine, and two from the formerly suspended
>     laptop.
>
>  *  OK, perhaps the Libertas is braindead enough to know the all-nodes
>     address hard-wired, but this wouldn't work for a configured
>     multicast address.  So I did "ip maddr" to see which addresses
>     the kernel has instructed each interface to listen on.  I waited
>     for the update.1-691 laptop to suspend.  I pinged
>     "ff02::1:ff10:a958", which is the address that the suspended
>     laptop was listening on, PLUS ONE!  The laptop did not wake up.  I
>     then pinged the right address, "ff02::1:ff10:a957".  It awakened
>     immediately, and responded to the second ping (not the first, as
>     above).  Working! with a minor bug.
>
> It sounds like there are several bugs in the mesh, mentioned above,
> that will prevent this from working over the mesh.  I'll file bug
> reports for them.
>
> Wad wailed:
> > The partitioning we made between the network processor and the
> > main processor was pretty clean.  Unfortunately, it doesn't support
> > low power operation.   I suggest rethinking the partition for Gen2.
>
> I'm happy to rethink this for Gen2.  But I'd like to see more detailed
> support for "the partitioning doesn't support low power operation".
> Are you talking about the Libertas chip taking too much power by
> itself, or about the host CPU having to wake too often?  I think it's
> still too early to tell whether the partitioning is correct.  We've
> barely shipped working code, we don't have a working software release
> process, and have had no time to optimize for either power or clean
> lines.
>
>        John
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.laptop.org/pipermail/devel/attachments/20080220/36525792/attachment.html>


More information about the Devel mailing list