[Http-crcsync] crccache ready for some testing I think

Toby Collett toby.collett at inro.co.nz
Sun Mar 29 17:08:12 EDT 2009


I fixed a bug in the buffering code this weekend that should have solved the not block matching problem. At the moment it tries to match a block only if it has more than block size bytes, and will keep blocksize -1 bytes at the end of the buffer if it doesnt get a match (so these can be tried again when more data arrives).

This is possible not as efficient as we can be, but it *should* match all possible blocks. Unfortunately it also makes the code a little more complex to read and debug.

Toby


----- Original Message -----
From: "Alex Wulms" <alex.wulms at scarlet.be>
To: "Toby Collett" <toby.collett at inro.co.nz>
Cc: http-crcsync at lists.laptop.org
Sent: Monday, 30 March, 2009 10:01:38 GMT +12:00 New Zealand
Subject: Re: [Http-crcsync] crccache ready for some testing I think

Hi,

Currently I'm experimenting with the various parameters of the cache module to 
fine-tune the cache behaviour.

I have also investigated why the cache-server stops finding block matches 
after the first difference. I think it is because at this moment, the cache 
server does not read ahead enough in the data-stream. Basically, the 
crc_read_block function of Rusty must be fed with as much data as possible, 
so that the crc_read_block function can detect at which point the unmatched 
data end and a next matching block can be found.

I have written a small example program that demonstrates better what I mention 
above. Should I submit it to git or should I email it to this list or only 
directly to people who are interested in it, as eventually it serves no long 
term purpose?

As a next step, I'm going to see if I can refactor the server to read more 
data into it's buffer (ideally the full response) and only then start to make 
multiple calls to crc_read_block function directly after each other, so that 
it can perform a better job. The price to pay for this approach is obviously 
a higher memory usage on the server but it will probably lead to better 
compression results, especially for html-pages where the first difference 
might already be some time-stamp in the html-headers. One idea to minimize 
the memory overhead is to perform this look-ahead approach only for text/... 
mime-types but not for other mime-types like images, audio or video.

Thanks and kind regards,
Alex
 

Op zaterdag 28 maart 2009, schreef Toby Collett:
> Hi,
> I have fixed up a couple of critical bugs in the crccache modules and I
> feel that the code is now able to be tested a little bit wider. For an
> unchanged upstream file you get high 90% size saving, this can drop off
> pretty sharply, a 'two lines changed' copy of w3.org homepage only got 75%
> savings, but that isnt too shabby either.
>
> Lots of work to be done tuning the cache etc as Alex has pointed out, but
> at this point you should be able to browse over the link (whether it is
> slower or faster is another question)
>
> The modules should still build against any recent apache build, so you
> shouldnt need to compile all of apache. Basic example configs for ubuntu
> are in the git repo and also instructions for suse.
>
> The server end will print out stats for transmission size at the end of an
> encoded response. This does not take into account the additional header
> size for the request.
>
> Toby




More information about the Http-crcsync mailing list