Infrequent heap corruption, XO-4, Fedora 20

Jon Nettleton jon.nettleton at gmail.com
Thu Feb 5 02:11:10 EST 2015


On Thu, Feb 5, 2015 at 8:00 AM, James Cameron <quozl at laptop.org> wrote:

> Thanks.
>
> Can I make it happen more often?
>
> Is there a later version of the driver?
>
> We have a different version that I may look into, on arm-3.5-android
> branch.
>
>
run memtester against the majority of your machines memory and then run
gtkperf in an X session.  That is usually enough to trigger it.

Considering that bug exists in all the 4.xx vivante galcore drivers I have
seen I doubt it is fixed in the other version.  Android is much simpler on
memory because it runs everything through a single GL context against a
framebuffer.

I have some tentative patches to fix parts of it in my trees but I doubt a
lot of them would apply to 3.5 without backporting a lot of upstream work.



> On Wed, Feb 04, 2015 at 12:14:02PM +0100, Jon Nettleton wrote:
> > It is a problem with the v4 version of the galcore driver.  We have
> replicated
> > it on a couple of platforms.
> >
> > On Wed, Feb 4, 2015 at 11:26 AM, Peter Robinson <[1]pbrobinson at gmail.com
> >
> > wrote:
> >
> >     On Wed, Feb 4, 2015 at 8:10 AM, James Cameron <[2]quozl at laptop.org>
> wrote:
> >     > Following up a thread from last September.
> >     >
> >     > This problem has just become more interesting, because it hit
> during
> >     > an activity startup.
> >     >
> >     > I'm quite used to seeing it with yum.  But seeing it without yum
> now
> >     > points us at kernel, glibc or python.
> >
> >     We've not seen this in the wider F-20 Fedora ARM distro so my bet
> >     would be on the kernel.
> >
> >     Peter
> >
> >     > [3]http://dev.laptop.org/ticket/12837#comment:4 has the details
> of the
> >     > most recent event.
> >     >
> >     > On Wed, Sep 10, 2014 at 01:56:27PM +1000, James Cameron wrote:
> >     >> G'day Peter,
> >     >>
> >     >> Thanks for any ideas you may have.
> >     >>
> >     >> The problem also reproduces on OLPC Fedora 20 image for XO-4:
> >     >>
> >     >> [4]http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd (552 MB)
> >     >>
> >     >> *** Error in `/usr/bin/python': free(): invalid pointer:
> 0x047c79ae ***
> >     >> ======= Backtrace: =========
> >     >> /lib/libc.so.6(+0x6c8b4)[0xb6c828b4]
> >     >> /lib/libc.so.6(+0x754e8)[0xb6c8b4e8]
> >     >> ======= Memory map: ========
> >     >> [...]
> >     >>
> >     >> The error varies in detail, but always suggests corruption of
> heap or
> >     >> pointers to heap.
> >     >>
> >     >> The triggering conditions are interactive use of yum, yum update,
> or
> >     >> yum used by olpc-os-builder.  The latter is a simple reproducer
> for me.
> >     >>
> >     >> I'm reproducing it on an XO-4, with 2GB of RAM, no swap, 8 GB
> eMMC, 8
> >     >> GB USB flash drive.
> >     >>
> >     >> While memory demand by yum is large by comparison to other
> programs,
> >     >> the available memory at the time of failure is ample.  There are
> no
> >     >> kernel out of memory (OOM) events.  It seems more likely to occur
> when
> >     >> the filesystem cache is under heavy demand.
> >     >>
> >     >> The method to recreate the problem was:
> >     >>
> >     >> 1.  install the system image 41001o4.zd using fs-update and then
> boot,
> >     >>
> >     >> 2.  configure wireless network,
> >     >>
> >     >> 3.  "yum install -y git olpc-os-builder"
> >     >>
> >     >> 4.  clone the master branch of
> >     >> git://[5]dev.laptop.org/projects/olpc-os-builder
> >     >> (last verified with b87e6ee)
> >     >>
> >     >> 5.  run "./osbuilder.py examples/olpc-os-14.1.0-xo4.ini"
> repeatedly
> >     >> until the error occurs (usually within about five attempts),
> >     >>
> >     >>
> >     >> I've also tried running under valgrind, but that causes illegal
> >     >> instruction.  It is quite likely I'm not using valgrind correctly.
> >     >> [6]http://dev.laptop.org/~quozl/z/1XRYtO.txt
> >     >>
> >     >> The workaround at the moment is to build our Fedora 20 images on
> >     >> Fedora 18.  Fedora 18 shows no sign of the problem.  I'm worried
> that
> >     >> a low probability heap corruptor may cause instability of
> applications
> >     >> in the field.
> >     >>
> >     >> The exact same kernel is being used for Fedora 18 and Fedora 20.
> >     >>
> >     >> On Tue, Sep 09, 2014 at 03:55:24PM +0100, Peter Robinson wrote:
> >     >> > What version of OOB are you using, and what config files? I can
> try
> >     >> > and recreate the problem here on other devices.
> >     >>
> >     >> --
> >     >> James Cameron
> >     >> [7]http://quozl.linux.org.au/
> >     >
> >     > --
> >     > James Cameron
> >     > [8]http://quozl.linux.org.au/
> >     _______________________________________________
> >     Devel mailing list
> >     [9]Devel at lists.laptop.org
> >     [10]http://lists.laptop.org/listinfo/devel
> >
> > References:
> >
> > [1] mailto:pbrobinson at gmail.com
> > [2] mailto:quozl at laptop.org
> > [3] http://dev.laptop.org/ticket/12837#comment:4
> > [4] http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd
> > [5] http://dev.laptop.org/projects/olpc-os-builder
> > [6] http://dev.laptop.org/~quozl/z/1XRYtO.txt
> > [7] http://quozl.linux.org.au/
> > [8] http://quozl.linux.org.au/
> > [9] mailto:Devel at lists.laptop.org
> > [10] http://lists.laptop.org/listinfo/devel
>
> --
> James Cameron
> http://quozl.linux.org.au/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.laptop.org/pipermail/devel/attachments/20150205/bd24d6df/attachment.html>


More information about the Devel mailing list