[Http-crcsync] General comments on crcsync document

Patrick McManus mcmanus at ducksong.com
Wed Jul 15 19:22:15 EDT 2009


I think that sounds very good. Big improvement. Should probably specify
that the hashes are byte aligned before encoding.. it's obvious for
crc32, but if someone wanted to use a 60 bit extension it would be more
useful information.

Thanks,
Patrick


On Wed, 2009-07-15 at 23:11 +0200, Alex Wulms wrote:
> Hi,
> 
> How about making crc32c the default, that must be supported by any crcsync 
> server and client but foresee in the protocol that other algorithms might be 
> added in future if pilot testing by early adapters might show that crc32 
> gives too many clashes in practice or if crc32 works fine for most people but 
> in a specific deployment with a specific usage scenario it turns out to be 
> too weak.
> 
> E.g. the client indicates the algorithm in the If-Block header and the server 
> may indicate all supported algorithms, in order of preference and with 
> prefered blocksize-multiple, in the capabilities header. If the server only 
> indicates that it is capable of crcsync, the client must assume that the 
> server only supports crc32 and that the server does not care about the exact 
> blocksize. However, if the server has preference for a certain blocksize 
> multiple, the indicates that together with the algorithm. The client should 
> then respect that.
> 
> Regarding the encoding of the check-sums:
> We could store them in network-byte-order in memory (4 bytes per checksum in 
> case of crc32) and then convert that array of bytes into a base64 encoded 
> string.
> 
> Putting it all togher:
> 
> Example of request header from the client (with 3 blocks, csl stands for 
> checksum list):
> 
> If-Block: alg=crc32c, fs=45, bs=20, csl=aaaabbbbccccdddd
> 
> Example of capability header from the server, supporting crc32c and md-mumble, 
> prefering blocksize multiple of 4 for both algorithms:
> 
> Capability: crcsync, alg=crc32c; m=4, alg=md-mumble; m=4
> 
> 
> Or am I now overcomplicating things?
> 
> Cheers,
> Alex
> 
> 
> Op maandag 13 juli 2009, schreef Patrick McManus:
> > On Sun, 2009-07-12 at 16:37 +0200, Alex Wulms wrote:
> > >  Should we make such hardware dependent optimizations
> > > part of the specification?
> >
> > Well, you absolutely do want to take into account the realities and
> > trends of machine organization, sure. That's not the same thing as
> > optimizing for one specific implementation.
> >
> > crc-32c is ubiquitous in both software and hardware in part because it
> > is well suited to a broad range of architectures. It is even good enough
> > for iSCSI and largely due to the fairly wide adoption of that protocol
> > you are seeing crc32c implemented in commodity hardware (sse4 on
> > nehalem). I suggest you ride that curve instead of pioneering a new
> > path.
> >
> > I would argue that if you think 32 bits isn't enough, you should embrace
> > a different standardized hash instead of something like crc-60 - even
> > though they are more expensive. md-muble or sha-mumble.. Those
> > algorithms also have widely available well optimized software
> > implementations and are also often implemented in hardware for the
> > network and security processors generally used to build network
> > applicances - and that is one kind of platform I would really expect to
> > see crcsync widely deployed on server side.
> >
> > In any event, 32 bits doesn't worry me a bit (ha ha! :)). Primarily
> > because crcsync does not rely on it for correctness - it has that
> > overarching sha-256 to (more or less) guarantee correctness. It's also
> > possible that the fact that the false-positive is not independent data
> > (i.e. that's not a random byte stream, its another revision of the same
> > URI which likely looks a lot like the revision you want) is going to
> > reduce the number of false positives.. CRCs don't give uniformly random
> > distributions quite on purpose. (i.e. they make lousy hash table
> > functions).
> >
> > So yes, if you use 32 bits and repeat that process for 40 blocks in a
> > transaction and then have several dozen million transactions the odds
> > are good that one person will have the sha-256 invalidate their
> > transaction and have to repeat what was a stateless non reliable
> > transaction (i.e. HTTP) anyhow. That sounds ok to me all things
> > considered.
> 



More information about the Http-crcsync mailing list