[Http-crcsync] General comments on crcsync document

Thu Jul 9 21:36:12 EDT 2009

> 
> more related comments on the spec.. it took me a few minutes to figure
> out that the if-block hashes are crc32's. the document just calls them
> hashes or crcs. I had to go to the code to find out it was crc 32. So
> that should really be documented. 

I've read some more code in the repo just now, and I see that it really
is crc-60.. so I need to walk back a little of that previous message I
sent about it being crc-32. There are outdated code comments that say it
is crc-32 which is what lead me down that path.

is it being calculated as 64 bits and just masked off on each pass?

using normal b64 rules, I also figure the 60 bits need to be padded out
to 72bits in order to generate 12 ascii characters.. but the last 2
chars just represent those 12 bits of pad and they are dropped from the
message header.. This is stuff that really ought to be written in the
doc so others don't have to reverse engineer it too. I think it isn't
the normal way to show the b64 string (which would always be a multiple
of 4 characters), so you might hesitate before standardizing it and
minimally show an example or two. 

But bigger picture I gotta say, a 60 bit hash is more than a little
unusual. From an implementation standpoint a lot of folks are just going
to have crc16 and crc32 libraries and not have any easy way to perform
that calculation and that won't help adoption of the spec.. so I'd like
to see a little language in the document justifying the need for it and
explaining why it is 60 and not 32 (or even 64). 

Is there a strong basis for 60, or is it just "more than 32". and was 32
shown to have problems significant enough to warrant the change? (the
strong sha is wrapped around the whole thing should it collide,
afterall.) 

Heck, Intel added CRC-32 as an SSE level instruction in SSE-4 in Nehalem
and later as a potential offload. Its also on hardware on the cavium
nitrox processors and (I think) other security processors you often find
web appliances built around.. I don't think they're adding crc-60 in
hardware any time soon;) 

If you're looking at a 40 in 4 billion chance (say with 40 crc-32
blocks) that's 1 in 100 million that you have to redo the request
without a delta. big deal. Is 1 in 100 million really enough of a
performance problem (and because of the sha it is only a performance
problem, not a correctness one) to justify going away from a widely
deployed and available algorithm such as crc32? I think this is a pretty
strong argument for doing 32 bit hashes on the blocks.

in a somewhat related thought:

"In case of a mismatch, the crcsync cache client should return an error
condition to the classical cache client and discard the original
instance from it's local store, to prevent the same error from
re-occurring when the user retries the request. "

I'm not sure this specification has any business telling the a cache
that it should discard legitimate instances from its local store. There
might very well be administrative policy in place pinning them there! A
more appropriate remedy would be prohibiting said client from sending
the same if-block sequence for a subsequent request to the same resource
without getting a successful transaction in between. Even changing the
block size on the next request would cure a crc conflict.

-me again