#6529 NORM Never A: Multicast ping over eth0 (not mesh) sometimes produces duplicate packets
Zarro Boogs per Child
bugtracker at laptop.org
Wed Feb 20 09:15:14 EST 2008
#6529: Multicast ping over eth0 (not mesh) sometimes produces duplicate packets
--------------------+-------------------------------------------------------
Reporter: gnu | Owner: dilinger
Type: defect | Status: new
Priority: normal | Milestone: Never Assigned
Component: kernel | Version: Development build as of this date
Keywords: | Verified: 0
Blocking: | Blockedby:
--------------------+-------------------------------------------------------
I found this problem while trying to reproduce #4616 on modern hardware
and software.
Setup:
* Two XO's, MP G1G1s. One is using build 656, the other update.1-691
(the "target" machine).
* In !NetworkManager screen, put both laptops on my local access point
(TrendNET TEW432-BRP). Wait a few minutes for things to settle down. Go
to donut screen, make sure both of them say they're on the access point.
* Start a terminal on each laptop. Become root.
* (Optional:) On the update.1-691 machine, do "ethtool -s eth0 wol
um". This enables wakeups on multicast packets.
* "ping6 -I eth0 ff02::1" on the other machine.
* This will ping the all-nodes multicast address. The laptop that
sends this should get back a unicast IPv6 ping response from each node on
the network. Keep moving the mouse on the update.1-691 laptop to avoid
suspending.
* On each laptop, it can see itself (btw, ping6 prints its own address
on its first line of output). It prints a very low latency response (e.g.
0.154 ms) packet from its own kernel. It should also print exactly one
"(DUP!)" packet per original packet, from the other laptop. Sometimes it
does. At other times it does this:
{{{
bash-3.2# ping6 -I eth0 ff02::1
PING ff01::1(ff02::1) from fe80::217:c4ff:fe11:1d3c eth0: 56 data bytes
64 bytes from fe80::217:c4ff:fe11:1d3c: icmp_seq=1 ttl=64 time=0.130 ms
64 bytes from fe80::317:c4ff:fe10:a957: icmp_seq=1 ttl=64 time=27.0 ms
(DUP!)
64 bytes from fe80::317:c4ff:fe10:a957: icmp_seq=1 ttl=64 time=27.6 ms
(DUP!)
64 bytes from fe80::217:c4ff:fe11:1d3c: icmp_seq=2 ttl=64 time=0.208 ms
64 bytes from fe80::317:c4ff:fe10:a957: icmp_seq=2 ttl=64 time=5.31 ms
(DUP!)
64 bytes from fe80::317:c4ff:fe10:a957: icmp_seq=2 ttl=64 time=6.43 ms
(DUP!)
64 bytes from fe80::217:c4ff:fe11:1d3c: icmp_seq=3 ttl=64 time=0.222 ms
64 bytes from fe80::317:c4ff:fe10:a957: icmp_seq=3 ttl=64 time=6.80 ms
(DUP!)
64 bytes from fe80::317:c4ff:fe10:a957: icmp_seq=3 ttl=64 time=7.82 ms
(DUP!)
--- ff02::1 ping statistics ---
3 packets ransmitted, 3 received, +6 duplicates, 0% packet loss, time
2007ms
rtt min/avg/max/mdev = 0.130/9.072/27.633/10.181 ms
bash-3.2#
}}}
Duplicating packets does not violate the IP protocol. Higher level
protocols must be able to cope with duplicates (whether sent twice by the
sender, e.g. as a retry, or whether duplicated in flight by some network
node). But it creates unnecessary traffic. I consider this a performance
bug.
I do not know exactly what conditions trigger this bug.
This could be a kernel problem, or could be a Libertas firmware problem.
I'm starting reporting it as kernel problem (partly because there's no
category for mesh or Libertas bugs). It should be possible to produce a
smoking gun in kernel logs, that show whether the chip ever gives the
kernel the missing packets.
By the way, if you do this test when configured to use the Mesh rather
than an access point, and you use multicast, you encounter a worse bug,
#6527.
--
Ticket URL: <http://dev.laptop.org/ticket/6529>
One Laptop Per Child <http://dev.laptop.org>
OLPC bug tracking system
More information about the Bugs
mailing list