#6529 NORM Never A: Multicast ping over eth0 (not mesh) sometimes produces duplicate packets

Wed Feb 20 09:15:14 EST 2008

#6529: Multicast ping over eth0 (not mesh) sometimes produces duplicate packets
--------------------+-------------------------------------------------------
 Reporter:  gnu     |       Owner:  dilinger                         
     Type:  defect  |      Status:  new                              
 Priority:  normal  |   Milestone:  Never Assigned                   
Component:  kernel  |     Version:  Development build as of this date
 Keywords:          |    Verified:  0                                
 Blocking:          |   Blockedby:                                   
--------------------+-------------------------------------------------------
 I found this problem while trying to reproduce #4616 on modern hardware
 and software.

     Setup:

     * Two XO's, MP G1G1s. One is using build 656, the other update.1-691
 (the "target" machine).

     * In !NetworkManager screen, put both laptops on my local access point
 (TrendNET TEW432-BRP).  Wait a few minutes for things to settle down. Go
 to donut screen, make sure both of them say they're on the access point.

     * Start a terminal on each laptop. Become root.

     * (Optional:)  On the update.1-691 machine, do "ethtool -s eth0 wol
 um".  This enables wakeups on multicast packets.

     * "ping6 -I eth0 ff02::1" on the other machine.

     * This will ping the all-nodes multicast address. The laptop that
 sends this should get back a unicast IPv6 ping response from each node on
 the network. Keep moving the mouse on the update.1-691 laptop to avoid
 suspending.

     * On each laptop, it can see itself (btw, ping6 prints its own address
 on its first line of output). It prints a very low latency response (e.g.
 0.154 ms) packet from its own kernel.  It should also print exactly one
 "(DUP!)" packet per original packet, from the other laptop.  Sometimes it
 does.  At other times it does this:
 {{{
  bash-3.2# ping6 -I eth0 ff02::1
  PING ff01::1(ff02::1) from fe80::217:c4ff:fe11:1d3c eth0: 56 data bytes
  64 bytes from fe80::217:c4ff:fe11:1d3c: icmp_seq=1 ttl=64 time=0.130 ms
  64 bytes from fe80::317:c4ff:fe10:a957: icmp_seq=1 ttl=64 time=27.0 ms
 (DUP!)
  64 bytes from fe80::317:c4ff:fe10:a957: icmp_seq=1 ttl=64 time=27.6 ms
 (DUP!)
  64 bytes from fe80::217:c4ff:fe11:1d3c: icmp_seq=2 ttl=64 time=0.208 ms
  64 bytes from fe80::317:c4ff:fe10:a957: icmp_seq=2 ttl=64 time=5.31 ms
 (DUP!)
  64 bytes from fe80::317:c4ff:fe10:a957: icmp_seq=2 ttl=64 time=6.43 ms
 (DUP!)
  64 bytes from fe80::217:c4ff:fe11:1d3c: icmp_seq=3 ttl=64 time=0.222 ms
  64 bytes from fe80::317:c4ff:fe10:a957: icmp_seq=3 ttl=64 time=6.80 ms
 (DUP!)
  64 bytes from fe80::317:c4ff:fe10:a957: icmp_seq=3 ttl=64 time=7.82 ms
 (DUP!)

  --- ff02::1 ping statistics ---
  3 packets ransmitted, 3 received, +6 duplicates, 0% packet loss, time
 2007ms
  rtt min/avg/max/mdev = 0.130/9.072/27.633/10.181 ms
  bash-3.2#

 }}}

 Duplicating packets does not violate the IP protocol.  Higher level
 protocols must be able to cope with duplicates (whether sent twice by the
 sender, e.g. as a retry, or whether duplicated in flight by some network
 node).  But it creates unnecessary traffic.  I consider this a performance
 bug.

 I do not know exactly what conditions trigger this bug.

 This could be a kernel problem, or could be a Libertas firmware problem.
 I'm starting reporting it as kernel problem (partly because there's no
 category for mesh or Libertas bugs). It should be possible to produce a
 smoking gun in kernel logs, that show whether the chip ever gives the
 kernel the missing packets.

 By the way, if you do this test when configured to use the Mesh rather
 than an access point, and you use multicast, you encounter a worse bug,
 #6527.

-- 
Ticket URL: <http://dev.laptop.org/ticket/6529>
One Laptop Per Child <http://dev.laptop.org>
OLPC bug tracking system