Where olpc machine spending time when using web broswer

Tue Mar 13 13:04:22 EDT 2007

William Cohen wrote:

> William Cohen wrote:
>
>> Looked at where the processor spends its time when browsing the web.
>>
>> Hardware configuration:
>>
>>  OLPC Beta 2 machine
>>  Linksys USB200M USB 10/100 for ethernet connection
>>  4GB memorex Mini Travel Drive for storage of image
>>
>>
>> Software configuration:
>>
>>  /tmp/olpc-redhat-stream-development-build-299-20070308_1417-devel_ext3.img
>>  kernel-2.6.21-20070309.olpc1p.dc5079fafb767e4
>>  oprofile-0.9.2-3.fc6
>
>
>
> Re ran the experiment on build 301 and installed the 
> xorg-x11-server-debuginfo-1.1.99.3-0.10.2.olpc1.i386.rpm on the olpc 
> machine, so I could take a look at where time is being spent in libfb.so.

I don't know what version of gcc and options were used to compile the 
packages.  If somebody points me where to look at this, I could be more 
sure.  It looks to me that the packages were compiled without usage of 
tunnning gcc to geode.  The div and mod insn are expensive in geode. 
 Usage of div or shifts are choosen in gcc expmed.c and this is directed 
by costs defined by -mtune or -march.

I already did gcc tunning to geode (pipeline description, code costs, 
i386 port parameter values) and submitted it to the gcc mainline.  As I 
know Jakub Julinek was going to backport this code to redhat gcc.  So I 
can guess that if the right compiler and options are used, it will make 
code faster (and several % smaller because -mtune=geode generates 
smaller code that any other tuning).

I somebody need a help to speed up some (critical) code for OLPC by 
choosing right options (like usage of mmx insn and vectorization and 
other numerous possibilities), I could help too.  Please let me know. 
 If I have an OLPC machine, I can do it.

>
> # opreport -t 1 -l /usr/bin/Xorg
> CPU: CPU with timer interrupt, speed 0 MHz (estimated)
> Profiling through timer interrupt
> samples  %        image name               symbol name
> 6514     68.1096  libfb.so                 fbFetchTransformed
> 613       6.4095  libfb.so                 fbFetchPixel_x8r8g8b8
> 446       4.6633  libfb.so                 
> fbCompositeSolidMask_nx8x0565mmx
> 252       2.6349  libfb.so                 fbStore_r5g6b5
> 169       1.7670  libfb.so                 fbRasterizeEdges
> 137       1.4325  libfb.so                 fbCompositeSrc_8888x0565mmx
> 113       1.1815  libfb.so                 fbCopyAreammx
> 99        1.0351  libfb.so                 mmxCombineOverU
>
> The attached file is a portion of the output from opannotate. There is 
> a group of MOD operations that are taking a significant portion of the 
> time. The first column is the number of samples and the second column 
> is the percentage.
>
>    398  6.1099 :                        x1 = MOD (x1, 
> pict->pDrawable->width);
>    383  5.8796 :                        x2 = MOD (x2, 
> pict->pDrawable->width);
>    336  5.1581 :                        y1 = MOD (y1, 
> pict->pDrawable->height);
>    355  5.4498 :                        y2 = MOD (y2, 
> pict->pDrawable->height);
>
> Following this there are also some other expensive operations to 
> compute r. and put it into buffer[i].
>
> -Will
>