[Http-crcsync] General comments on crcsync document

Sat Jul 18 04:51:15 EDT 2009

2009/7/17 Patrick McManus <mcmanus at ducksong.com>

> it occurs to me that there is an obvious mitigation: any document with
> 100K hash values and just 1 hit is a really poor delta and indeed
> probably isn't a delta at all. Any response that is _almost_ all
> literals should probably just make itself into all literals (or just
> plain 200-non-delta)..
>

The problem here is that a non buffering implementation of a chunked
transfer doesnt know it has only got one hash match before it starts sending
the data...

Also from my limited understanding of crc sums, the excellent properties for
detecting small changes seem to be very small changes (i.e. bursts of up to
the hash length). Even for very similar web pages the chance of having all
of the changes contained within a 32 bit range.

This one is a bit of a long shot, but is it possible to just send the
content twice, once encoded, followed by a stream flush, and once unencoded.
At the client end if they get a strong hash error they just keep receiving
the rest of the content unencoded, if they get a strong hash match then they
terminate the connection. This would probably play hell with caches, cause
complications on the server end as you would have to buffer the whole file
for appending, and chances are half the unencoded file would already be in
transit by the time you killed the connection in the no corruption case, but
just wondered if it would trigger any ideas from others along the same
lines.

Toby

-- 
This email is intended for the addressee only and may contain privileged
and/or confidential information
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.laptop.org/pipermail/http-crcsync/attachments/20090718/c3bd081d/attachment.htm