jffs zlib tuning

Mon Jan 7 13:49:47 EST 2008

> However, I recommend hacking in libz first, making it work with
> gzip, and staert porting it to the kernel next step. Debugging
> and benchmarking in userspace is *so* much easier.
>
Ah, I tried it. Unfortunately the zlib in the kernel is a heavily 
modified zlib and I was not able to compile it in user space. So I have 
learnt writing kernel modules (took just 2 days, it is so simple 
compared to windows drivers that I still cannot believe it). And there 
is a comment in the kernel zlib that user-space support was removed...
> Also, I'd be very surprised if such an obvious optimization
> hadn't been tried already in 20+ years of gzip.  Try digging
> around: you may find that it's not worth it.
>
The optimization that I thought of is absolutely Geode specific. First 
it needs some prefetch, and secondly it has a lot of branches. The Geode 
has a very simple 1 bit branch predictor (it seems like that but not 
documented) so it can waste 20-40 cycles for every run (every 
length/distance code). I know that it is hard to create better code than 
a C compiler nowadays so I am sure that simply rewriting the code in asm 
would not speed things up (>5 years ago LZO had several asm 
implementations for 486/586/686 but ironically all were slower than the 
compiler generated one.)

Now that you have mentioned that jffs2 uses only 4K blocks, it can be 
possible that the bottleneck is not in inffast.c. Do you have ANY 
perf/profile data, please?
All I would like to know whether the bottleneck lies in inffast or not:
/*
   When large enough input and output buffers are supplied to inflate(), for
   example, a 16K input buffer and a 64K output buffer, more than 95% of the
   inflate execution time is spent in this routine.
*/