[Http-crcsync] General comments on crcsync document
Patrick McManus
mcmanus at ducksong.com
Mon Jul 13 10:11:31 EDT 2009
On Sun, 2009-07-12 at 16:37 +0200, Alex Wulms wrote:
> Should we make such hardware dependent optimizations
> part of the specification?
Well, you absolutely do want to take into account the realities and
trends of machine organization, sure. That's not the same thing as
optimizing for one specific implementation.
crc-32c is ubiquitous in both software and hardware in part because it
is well suited to a broad range of architectures. It is even good enough
for iSCSI and largely due to the fairly wide adoption of that protocol
you are seeing crc32c implemented in commodity hardware (sse4 on
nehalem). I suggest you ride that curve instead of pioneering a new
path.
I would argue that if you think 32 bits isn't enough, you should embrace
a different standardized hash instead of something like crc-60 - even
though they are more expensive. md-muble or sha-mumble.. Those
algorithms also have widely available well optimized software
implementations and are also often implemented in hardware for the
network and security processors generally used to build network
applicances - and that is one kind of platform I would really expect to
see crcsync widely deployed on server side.
In any event, 32 bits doesn't worry me a bit (ha ha! :)). Primarily
because crcsync does not rely on it for correctness - it has that
overarching sha-256 to (more or less) guarantee correctness. It's also
possible that the fact that the false-positive is not independent data
(i.e. that's not a random byte stream, its another revision of the same
URI which likely looks a lot like the revision you want) is going to
reduce the number of false positives.. CRCs don't give uniformly random
distributions quite on purpose. (i.e. they make lousy hash table
functions).
So yes, if you use 32 bits and repeat that process for 40 blocks in a
transaction and then have several dozen million transactions the odds
are good that one person will have the sha-256 invalidate their
transaction and have to repeat what was a stateless non reliable
transaction (i.e. HTTP) anyhow. That sounds ok to me all things
considered.
More information about the Http-crcsync
mailing list