Oprofile, swap

M. Edward (Ed) Borasky znmeb at cesmail.net
Wed Dec 19 00:27:48 EST 2007


John Richard Moser wrote:
> 
> (Note:  most of this message isn't very useful probably; it's about 
> theoretical software architecture, that nobody's going to implement, 
> that I can't prove, that I'm not really 100% sure about.  Still, if you 
> WANT to read it, hey... remember, bad ideas sometimes get corrected by 
> people who are smart enough to turn them into GOOD ideas)
> 
> 
> Ivan Krstić wrote:
>> On Dec 18, 2007, at 12:27 PM, Jameson Chema Quinn wrote:
>>> Has anyone looked at Psyco on the XO?
>>
>>
>> Psyco improves performance at the cost of memory. On a 
>> memory-constrained machine, it's a tradeoff that can only be made in 
>> laser-focused, specific cases. We have not done the work -- partly for 
> 
> It would be wise to throw out the idea of laser-focusing which engine to 
> use.  Think of memory costs for running multiple versions of Python. 
> Then again, what IS wise?
> 
> Any such system needs to efficiently use memory.  I like the idea of one 
> based on Mono since it has that whole compacting garbage collector, 
> which (although being a cache destroyer by nature) at least shrinks down 
> memory usage.  Of course then you still have Mono on top of it, and CIL 
> code that's been generated, and reflected, and JIT'd, which means you 
> (again) have 2 interpreters in memory (one in CIL, one being Mono 
> itself), and one gets dynamic recompiled (the Python one in CIL), and 
> all the intermediary (CIL) code gets kept for later profiling and 
> optimization...
> 
> ... didn't I say before I hate the concept of JIT dynamic compilation? 
> Interpreters just suck by nature due to dcache problems (code becomes 
> data, effectively your instruction working set is fixed, the load 
> doesn't go onto icache and dcache both as the program gets bigger...) 
> and due to the fact that you have to do a LOT of work to decode an insn 
> in software (THINK ABOUT QEMU).  Interpreters for specific script 
> languages like Python and Perl have the advantage of not having to be a 
> general CPU emulator, so they can have instructions that are just 
> function calls that go into native code.
> 
> 
> So execution time order:
> 
> Native code <  // *1
> JIT < // *2
> Specific language interpreter < // *3
> General bytecode interpreter < // *4
> Parser script interpreter // *5
> 
> *1:  Native code.  C, obj-C, something compiled.  everything else I 
> could mention is out of date.
> 
> *2:  Technically JIT is native code, but there's also extra 
> considerations with memory use and cache pressure comes into play 
> slightly.  After the ball gets rolling it just eats more memory but 
> cache and execution speed are fine.
> 
> *3:  A specific language interpreter might call a native code strcpy() 
> function instead of have an insn for CALL that goes into a bytecode 
> implementation of strcpy(), or having an insn for CALL that goes into a 
> bytecode strcpy() that just sets up a binding and calls real native 
> strcpy().  The interpreter would head straight for native land, going 
> "function foobar() gets assigned token 0x?? and I'll know what to do 
> when I see it."
> 
> *4:  A general CPU interpreter is going to have to be a CPU emulator. 
> Java and Mono count, for Java and CIL CPUs.  These CPUs don't really 
> exist but those interpreters work that way, they even have their own 
> assembly.
> 
> *5:  Some script engines are REALLY FREAKING DUMB and actually send each 
> line through a parser every time they see it, which is megaslow.  These 
> usually don't last, or just function as proof of concept until a real 
> bytecode translator gets written to make a specific language interpreter.
> 
> 
> Maybe, MAYBE by twiddling with a JIT, you could convince it to discard 
> generated bytecode.  For example, assuming we're talking about a Python 
> implementation on top Mono, and we can modify Mono any way we want with 
> reasonably little effort:
> 
>  - Python -> internal tree (let's say Gimple, like gcc)
>  - Gimple -> optimizer (Python)
>  - Gimple (opt) -> optimizer (general)
>  - Gimple (opt) -> CIL data (for reflection)
>  - FREE:  Gimple
>  - CIL (data) -> Reflection (CIL)
>  - FREE:  CIL data (for reflection)
>  - CIL -> CIL optimizer
>  - CIL (opt) -> JIT (x86)
>  - While (not satisfied)
>    - The annoying process of dynamic profiling
>    - CIL (opt, profiled) -> JIT (x86)
>  - FREE:  CIL
> 
> NOTE:  at the FREE CIL data step, we are talking about the Python 
> interpreter freeing the CIL data that it has; Mono has now loaded a copy 
> as CIL code, we don't need to give it to it again, we're done with it.
> 
> At this point we should have:
> 
>  - A CIL program for a Python interpreter
>  - A CIL interpreter (Mono)
>  - x86 native code for the program
> 
> Further, you should be able to make the Python interpreter do a number 
> of things:
> 
>  - Translate any Python-written libraries via JIT on a method-for-method
>    basis
>  - Translate Python bindings (Python calling C) to active CIL bindings
>    (to avoid calling back to the interpreter)
>  - Unload most of itself when done (say, when it's been unused for about
>    5 minutes of execution time), save for a method that loads the Python
>    interpreter back into memory AGAIN when a new method gets called, so
>    that it can dynamic-compile it.
> 
> Thus, you should be able to achieve the near total elimination of the 
> CIL program for a Python interpreter and just leave Mono and the program 
> itself, already JIT'd to native code, in memory.  You would need a 
> fragment of the Python interpreter loaded to handle any entry back into 
> the Python interpreter, with a single function to load it again; each 
> re-entry point would just lead the whole engine and then jump to the 
> actual handler in it.  This of course isn't much (if you need a whole 
> page of code for that I'm surprised).
> 
> Mind you there's a number of flaws in this argument.  You probably 
> noticed most of them.
> 
>  - IronPython is of not entirely acceptable license; nobody is going to
>    make a SECOND Python/CIL dynamic compiler
>  - You can get an IL stream for any compiled method.  Mono won't free
>    CIL stuff.  It may actually be small enough not to care; or not.
>    I believe it's actually too big to be feasible.  MAYBE you can add
>    something to Mono to allow flushing that permanently on purpose (i.e.
>    by the Python interpreter).
>  - You're still dealing with JIT'd code, which is still not shareable.
>    Mono seems to put that in WX segments, so by counting these (in
>    kilobytes) I can ascertain the exact size of the executable code.
>    Because it's not shared, it doesn't get evicted from memory if not
>    used like normal .so files or /bin executables.  I have a memory
>    analysis script I wrote that does the trick, if I have bash play with
>    the output; here's what Tomboy looks like on x86-64, about 13MB:
> 
>    $ echo $(( $(~/memuse.sh 19132 | grep "p:wx" | cut -f1 -d' ' | \
>       tr -d 'K' | tr -d 'B' | xargs | sed -e "s/ / + /g" ) ))
>    13544
> 
> I like to think of programs like kernels, or kernels like programs. 
> Either way, I like to treat applications like microkernels.  In the 
> embedded scene, this may actually be critical; maybe you should think 
> that way for the XO, in a little part.  (Re:  the part about unloading 
> the entire Python interpreter except for a little bit that reloads it if 
> needed...)
> 
> 
>> -- 
>> Ivan Krstić <krstic at solarsail.hcs.harvard.edu> | http://radian.org
>>
>>
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Devel mailing list
> Devel at lists.laptop.org
> http://lists.laptop.org/listinfo/devel

Should we tell him about gForth/vmgen? YARV (Ruby 1.9)? Jython? jRuby? <weg>

But seriously, folks, if you want a hard-core tweakable compact way to 
code on the XO, either install the gForth RPM or get the "native" XO 
Forth derived from the boot firmware Forth. IIRC the gForth RPM installs 
less than a megabyte and contains a complete ANS Forth 
compiler/decompiler, assembler/disassembler, doesn't need a 
linker/loader, and is *way* cooler than Python or C. :) If you go with 
gForth, you might want to check out the Anton Ertl/David Gregg papers on 
virtual machines, branch prediction, etc.



More information about the Devel mailing list