[Http-crcsync] Fwd: Re: [Fwd: Re: Google summer of code : 'web pages over rsync']

Wed Jun 17 16:58:40 EDT 2009

Hi,

Interesting info. I almost got concerned when I noticed the patent reference 
but after reading the patent description it turns out that we are fortunately 
following a different approach, if I understand it well.

In their implementation it is the server that keeps track of the reference 
resources (the canonical document) while in our implementation it is the 
client (the cached page, over which the crcblocks are calculated).

The advantage of the server managing the reference resources is that they can 
generate a really minimal delta. In our case, the delta is still small but, 
once uncompressed, it is always a multiple of the blocksize 
(client-page-size/40, assuming we stick to 40 blocks).

We would have to do some measurements to see how often we we stay below the 
magical 3000 bytes threshold. Will analyze my test logs coming weekend. Will 
also see if, on the server side, I can put the compression ratio of zlib more 
aggresively and analyze the impact of that.

Thanks and brs,
Alex

Ps: so-far, the http-crcsync protocol only deals properly with text-based 
resources. It can't do a proper delta on multi-media resources.

Op woensdag 17 juni 2009, schreef Gervase Markham:
> Hey guys,
>
> Here are some thoughts on the protocol from a Mozilla contributor who
> has worked on similar things in the past. I thought they might be very
> useful when considering the protocol design, particularly the bit about
> packet sizes.
>
> Gerv
>
> -------- Original Message --------
> Subject: Re: [Fwd: Re: Google summer of code : 'web pages over rsync']
> Date: Wed, 01 Apr 2009 14:14:40 -0400
> From: Patrick McManus <mcmanus at ducksong.com>
> To: Gervase Markham <gerv at mozilla.org>
>
> Hi Guys,
>
> I did a commercial implementation of this back in the late 90's -
> DeltaEdge was part of AppliedTheory's CDN offering. (later clearblue,
> now navisite - I think they are still running it somewhere at navisite -
> I'm not affiliated with them anymore).
>
> I am the inventor (though not the assignee) of a related patent -
> http://www.wikipatents.com/6826626.html
>
> The CDN introduced 2 proxies - one "close" to the UA and one colocated
> with the origin server. The delta took place between them - hopefully on
> the high latency hop. It was all done with TE and Transfer-Encoding so
> any client or server that wanted to participate could - it wasn't a
> proprietary tunnel. The proxies are closed source software written from
> scratch.
>
> Akamai and Limelight have similar technology they offer. iirc (its been
> a long while) those do use propriety tunnels and might require special
> APIs.. DeltaEdge was just an encoding which I think is what you're
> talking about here.
>
> Anyhow - the big papers on the topic were a SIGCOMM paper that inspired
> me (97 perhaps) by Jeff Mogul, and some work by Jim Douglis at ATT..
> worth googling them.
>
> Jason's question was about metrics.. the first cut was media-type.. I
> only dealt with text types as content-aware algorithms are typically
> needed to get useful delta results with binary types.. does the rsync
> proposal address that?
>
> as for text types, the issue is really the # of rtts needed to move the
> data.. all text is generally relatively small, but you can make it
> really snappy if you get it below the magic 3000 byte mark. 3000
> generally turns into 2 pkts which will almost always fit in one
> congestion window - which in turn means you can do the whole req/resp in
> one rtt and that's a HUGE win. there's a fair degree of exceptions, but
> iirc the measurements (and I could take a million of them on a very busy
> application server running this) beared this out very well - many of the
> transactions were either super fast or just marginally improved
> depending on which side of this magic line they fell on.
>
> And yes, you can jam a lot of compressed delta info in 3KB - so this is
> a pretty effective strategy. The client CPU requirements are pretty
> simple - it's generally a one pass process to reassemble a lot of
> pointer/length references and do some huffman decoding - though that's
> going to depend on the delta algorithm of course.
>
> I always thought the hard part was "delta against what?". i.e. what is
> the base document.. that's easy from the client's pov - it should be the
> last copy of the document that it has. But the server cannot keep around
> every freaking revision of a dynamic resource (or even 1 per client)..
> so it has to have a canonical document to delta against.. in deltaedge
> we called that the reference-entity. A lot of time was spent on figuring
> out how to a set of manage RE's, when to update them, etc.. It's
> probably not a big deal for a mozilla client implementation, but its
> worth thinking about what it might mean in terms of overall uptake of
> the system. (i.e. it can make the server side pretty resource hungry.)
>
> my two cents.
>
> -Patrick
> _______________________________________________
> Http-crcsync mailing list
> Http-crcsync at lists.laptop.org
> http://lists.laptop.org/listinfo/http-crcsync