[Http-crcsync] Literal blocks in crcsync protocol are now deflated, and some further discussion on the protocol
Alex Wulms
alex.wulms at scarlet.be
Mon Apr 27 14:05:09 EDT 2009
Hi,
I have updated the crccache client and server to compress the literal blocks
(e.g. blocks of non-matched data) with zlib deflate algorithm, so that the
total size is as small as possible.
I have also made a few small changes with respect to how to deal with
gzip-encoded content:
1) The crccache-server now dynamically uses mod-deflate to inflate
gzip-encoded content; it only inserts the inflate filter (just before itself)
when the request contains the crcblocks header. E.g. when itself wants to
crcsync-encode the response.
2) On the client-side, mod-deflate is now always invoked in inflate mode so
that fresh pages (e.g. pages that have not yet been crcsync-encoded) are
stored non-compressed in the cache, so that they can serve as a good basis to
calculate crcblocks on the next request for the same page.
This whole deflation/inflation setup is required because the principal of
crcsync does not work well with compressed files/pages. It only works well
with non-compressed data. After all, as soon as only one byte changes in the
original page, the entire compressed stream after that single byte changes
completely, so no further blocks would match anymore if working with
gzip-encoded streams and gzip-encoded cache entries.
This has also made me think a lot about the whole protocol story. I have
studied the original delta-http RFC that has been mentioned already a few
times in this thread and have also given it a few good nights of sleep. I
have now some idea on how we can make a clean protocol but must still work it
out on paper. Several ideas from the original delta-http protocol RFC are not
really applicable to our situation; that paper is entirely based on the
assumption that the server works with semi-static pages and keeps a history
of those pages, while our implementation is based on the idea that servers
will *not* know the page that was previously served to the client. Which is
indeed more realistic due to the dynamic nature of most web sites. So we need
a slightly different approach then described in that RFC.
Will put my ideas tomorrow on paper and submit them to this list for further
discussions.
Kind regards,
Alex
More information about the Http-crcsync
mailing list