Oprofile, swap

M. Edward (Ed) Borasky znmeb at cesmail.net
Tue Dec 18 10:41:37 EST 2007


John Richard Moser wrote:
> I just got my OX laptop (hopefully some kid gets the other one soon... 
> or not), and noticed it's slow and kind of buggy.  I think I'll get a 
> $25 4GB SD card for a SWAP area...
> 
> I should run oprofile too, and have it write to the SD card.  I 
> understand what an interpreted language like Python does to the CPU but 
> it shouldn't be this bad... it's only going to be like 100 times slower? 
>   An actual interpreter will...
> 
>   - Put pressure on the data cache as its code grows
>   - ... but keep the actual interpreter (code) in cache better
>   - Use a relatively large chunk of data for a look-up table
>   - ... or use some convoluted and hard to maintain code
>   - ... or optimally, a look-up table to start the decoding process, if
>     like a CPU bytecode interpreter (Java, CIL) it has an insn + address
>     mode + data (not QUITE optimal for Python, but maybe since simple
>     addition and call happens)
>   - Wind up doing what can easily become a multi-hundred-cycle decoding
>     process for each executed bytecode insn
> 
> Python rewrites to bytecode (good, interpreting text is slow!  Multiple 
> parsing!) but a lot of the main function calls in the API should be C, 
> not Python (taking some of the pressure off).  This means Python should 
> be doing a lot of logic in native space, rather than interpreting a lot 
> (unlike Java, which had its whole library written in Java...)
> 
> I suggest taking a look at PyPy for Python, which will dynamic recompile 
> Python to native code and likely give some good performance benefits.  I 
> really can't stand JIT compilation and would prefer something that takes 
> advantage of Mono's own facilities, to centralize the effort in the JIT 
> at least (Mono has nice stuff), but IronPython is Microsoft Permissive 
> License which is not OSI approved.
> 
> As for real solutions, I want to profile things and see where they're 
> hanging.  I may need a Python profiler too, to get a look inside the 
> Python code and see if some functions there are also bad; oprofile will 
> tell me if Python itself is spending an ungodly amount of time in its 
> decoder functions but that's it.
> 

I don't have my physical unit yet, but I too am interested in profiling 
and performance tuning. Unfortunately I have no Python tuning experience 
so I can't be of much help at the moment. I do have a "virtual ship2", 
running on a 2.2 GHz Athlon64 X2, but that of course is cheating. :)

Oprofile is a bit tough to work with -- it makes you install a whole 
bunch of GUI libraries just to get at the low-level profiling stuff. And 
the kernel needs to be rebuilt with the right options -- I don't know if 
the OLPC kernel does so. So for now, I think you'll probably be better 
off with lower-level command-line tools.

I know "top" is there, but as far as I'm concerned the one must-have 
package is "sysstat". "sysstat" is a work of pure genius -- it started 
out as a Linux re-implementation of "sar" and "iostat", but it is much 
more than that now. Once I get my physical unit, I'll be looking at 
things in some detail.

I'm guessing that adding swap isn't going to help you. If you're memory 
bound, the solution is to stop activities that you aren't using, not 
forcing the kernel to move stuff in and out of RAM. "top" will tell you. 
Open a terminal window and type "top". At the top of the display you'll 
see memory used, free, cached, etc. There's a keystroke that will sort 
processes by their resident set size. Type "h" to get a help menu. If I 
get a chance tonight, I'll fire up a bunch of activities in my virtual 
XO and see what it does when it runs out of RAM.





More information about the Devel mailing list