Infrequent heap corruption, XO-4, Fedora 20
Jon Nettleton
jon.nettleton at gmail.com
Thu Feb 5 02:11:10 EST 2015
On Thu, Feb 5, 2015 at 8:00 AM, James Cameron <quozl at laptop.org> wrote:
> Thanks.
>
> Can I make it happen more often?
>
> Is there a later version of the driver?
>
> We have a different version that I may look into, on arm-3.5-android
> branch.
>
>
run memtester against the majority of your machines memory and then run
gtkperf in an X session. That is usually enough to trigger it.
Considering that bug exists in all the 4.xx vivante galcore drivers I have
seen I doubt it is fixed in the other version. Android is much simpler on
memory because it runs everything through a single GL context against a
framebuffer.
I have some tentative patches to fix parts of it in my trees but I doubt a
lot of them would apply to 3.5 without backporting a lot of upstream work.
> On Wed, Feb 04, 2015 at 12:14:02PM +0100, Jon Nettleton wrote:
> > It is a problem with the v4 version of the galcore driver. We have
> replicated
> > it on a couple of platforms.
> >
> > On Wed, Feb 4, 2015 at 11:26 AM, Peter Robinson <[1]pbrobinson at gmail.com
> >
> > wrote:
> >
> > On Wed, Feb 4, 2015 at 8:10 AM, James Cameron <[2]quozl at laptop.org>
> wrote:
> > > Following up a thread from last September.
> > >
> > > This problem has just become more interesting, because it hit
> during
> > > an activity startup.
> > >
> > > I'm quite used to seeing it with yum. But seeing it without yum
> now
> > > points us at kernel, glibc or python.
> >
> > We've not seen this in the wider F-20 Fedora ARM distro so my bet
> > would be on the kernel.
> >
> > Peter
> >
> > > [3]http://dev.laptop.org/ticket/12837#comment:4 has the details
> of the
> > > most recent event.
> > >
> > > On Wed, Sep 10, 2014 at 01:56:27PM +1000, James Cameron wrote:
> > >> G'day Peter,
> > >>
> > >> Thanks for any ideas you may have.
> > >>
> > >> The problem also reproduces on OLPC Fedora 20 image for XO-4:
> > >>
> > >> [4]http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd (552 MB)
> > >>
> > >> *** Error in `/usr/bin/python': free(): invalid pointer:
> 0x047c79ae ***
> > >> ======= Backtrace: =========
> > >> /lib/libc.so.6(+0x6c8b4)[0xb6c828b4]
> > >> /lib/libc.so.6(+0x754e8)[0xb6c8b4e8]
> > >> ======= Memory map: ========
> > >> [...]
> > >>
> > >> The error varies in detail, but always suggests corruption of
> heap or
> > >> pointers to heap.
> > >>
> > >> The triggering conditions are interactive use of yum, yum update,
> or
> > >> yum used by olpc-os-builder. The latter is a simple reproducer
> for me.
> > >>
> > >> I'm reproducing it on an XO-4, with 2GB of RAM, no swap, 8 GB
> eMMC, 8
> > >> GB USB flash drive.
> > >>
> > >> While memory demand by yum is large by comparison to other
> programs,
> > >> the available memory at the time of failure is ample. There are
> no
> > >> kernel out of memory (OOM) events. It seems more likely to occur
> when
> > >> the filesystem cache is under heavy demand.
> > >>
> > >> The method to recreate the problem was:
> > >>
> > >> 1. install the system image 41001o4.zd using fs-update and then
> boot,
> > >>
> > >> 2. configure wireless network,
> > >>
> > >> 3. "yum install -y git olpc-os-builder"
> > >>
> > >> 4. clone the master branch of
> > >> git://[5]dev.laptop.org/projects/olpc-os-builder
> > >> (last verified with b87e6ee)
> > >>
> > >> 5. run "./osbuilder.py examples/olpc-os-14.1.0-xo4.ini"
> repeatedly
> > >> until the error occurs (usually within about five attempts),
> > >>
> > >>
> > >> I've also tried running under valgrind, but that causes illegal
> > >> instruction. It is quite likely I'm not using valgrind correctly.
> > >> [6]http://dev.laptop.org/~quozl/z/1XRYtO.txt
> > >>
> > >> The workaround at the moment is to build our Fedora 20 images on
> > >> Fedora 18. Fedora 18 shows no sign of the problem. I'm worried
> that
> > >> a low probability heap corruptor may cause instability of
> applications
> > >> in the field.
> > >>
> > >> The exact same kernel is being used for Fedora 18 and Fedora 20.
> > >>
> > >> On Tue, Sep 09, 2014 at 03:55:24PM +0100, Peter Robinson wrote:
> > >> > What version of OOB are you using, and what config files? I can
> try
> > >> > and recreate the problem here on other devices.
> > >>
> > >> --
> > >> James Cameron
> > >> [7]http://quozl.linux.org.au/
> > >
> > > --
> > > James Cameron
> > > [8]http://quozl.linux.org.au/
> > _______________________________________________
> > Devel mailing list
> > [9]Devel at lists.laptop.org
> > [10]http://lists.laptop.org/listinfo/devel
> >
> > References:
> >
> > [1] mailto:pbrobinson at gmail.com
> > [2] mailto:quozl at laptop.org
> > [3] http://dev.laptop.org/ticket/12837#comment:4
> > [4] http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd
> > [5] http://dev.laptop.org/projects/olpc-os-builder
> > [6] http://dev.laptop.org/~quozl/z/1XRYtO.txt
> > [7] http://quozl.linux.org.au/
> > [8] http://quozl.linux.org.au/
> > [9] mailto:Devel at lists.laptop.org
> > [10] http://lists.laptop.org/listinfo/devel
>
> --
> James Cameron
> http://quozl.linux.org.au/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.laptop.org/pipermail/devel/attachments/20150205/bd24d6df/attachment.html>
More information about the Devel
mailing list