[Http-crcsync] crccache ready for some testing I think

Sun Mar 29 17:18:30 EDT 2009

Did you submit this improvement already to git? I'm up to date with the the 
current head (e.g. the changes you submitted 22 hours ago) and the problem 
still persists in that version.

Op zondag 29 maart 2009, schreef Toby Collett:
> I fixed a bug in the buffering code this weekend that should have solved
> the not block matching problem. At the moment it tries to match a block
> only if it has more than block size bytes, and will keep blocksize -1 bytes
> at the end of the buffer if it doesnt get a match (so these can be tried
> again when more data arrives).
>
> This is possible not as efficient as we can be, but it *should* match all
> possible blocks. Unfortunately it also makes the code a little more complex
> to read and debug.
>
> Toby
>
>
> ----- Original Message -----
> From: "Alex Wulms" <alex.wulms at scarlet.be>
> To: "Toby Collett" <toby.collett at inro.co.nz>
> Cc: http-crcsync at lists.laptop.org
> Sent: Monday, 30 March, 2009 10:01:38 GMT +12:00 New Zealand
> Subject: Re: [Http-crcsync] crccache ready for some testing I think
>
> Hi,
>
> Currently I'm experimenting with the various parameters of the cache module
> to fine-tune the cache behaviour.
>
> I have also investigated why the cache-server stops finding block matches
> after the first difference. I think it is because at this moment, the cache
> server does not read ahead enough in the data-stream. Basically, the
> crc_read_block function of Rusty must be fed with as much data as possible,
> so that the crc_read_block function can detect at which point the unmatched
> data end and a next matching block can be found.
>
> I have written a small example program that demonstrates better what I
> mention above. Should I submit it to git or should I email it to this list
> or only directly to people who are interested in it, as eventually it
> serves no long term purpose?
>
> As a next step, I'm going to see if I can refactor the server to read more
> data into it's buffer (ideally the full response) and only then start to
> make multiple calls to crc_read_block function directly after each other,
> so that it can perform a better job. The price to pay for this approach is
> obviously a higher memory usage on the server but it will probably lead to
> better compression results, especially for html-pages where the first
> difference might already be some time-stamp in the html-headers. One idea
> to minimize the memory overhead is to perform this look-ahead approach only
> for text/... mime-types but not for other mime-types like images, audio or
> video.
>
> Thanks and kind regards,
> Alex
>
> Op zaterdag 28 maart 2009, schreef Toby Collett:
> > Hi,
> > I have fixed up a couple of critical bugs in the crccache modules and I
> > feel that the code is now able to be tested a little bit wider. For an
> > unchanged upstream file you get high 90% size saving, this can drop off
> > pretty sharply, a 'two lines changed' copy of w3.org homepage only got
> > 75% savings, but that isnt too shabby either.
> >
> > Lots of work to be done tuning the cache etc as Alex has pointed out, but
> > at this point you should be able to browse over the link (whether it is
> > slower or faster is another question)
> >
> > The modules should still build against any recent apache build, so you
> > shouldnt need to compile all of apache. Basic example configs for ubuntu
> > are in the git repo and also instructions for suse.
> >
> > The server end will print out stats for transmission size at the end of
> > an encoded response. This does not take into account the additional
> > header size for the request.
> >
> > Toby