[Http-crcsync] General comments on crcsync document

Patrick McManus mcmanus at ducksong.com
Thu Jul 9 18:35:32 EDT 2009


On Thu, 2009-07-09 at 23:31 +0200, Alex Wulms wrote:
> Hi Patrick,
> 
> Sorry, I just realise I got my formula's mixed-up. I was starting from the 
> situation that we know the filesize and the blocksize but in reality we know 
> the filesize and the number of blocks, including the trailing block.
> 

well that's what I'm getting at. you're going come up with some rule
that basically says each non trailing block must be equal sized and as
large as possible so that the trailing chunk is smaller than the number
of blocks in the set. right?

It is deterministic, true. But un-necessarily complex.

Why not just send both the file and block sizes in the if-block header?
That's certainly a more compact representation and then if any firewall
strips unrecognized request headers on you (which they will) at least it
is an all or nothing proposition.

If-Block: fs=45, bs=20, crc1, crc2, crc3

(the trailing crc is implicitly over the remainder.. 5 bytes in this
case).

> 
> I must think about how to specify this mathematically but I'm too tired now. 
> But in your example, 3x15 and no trailer (or a zero-size trailer) would be 
> the right one.

But from an efficiency standpoint its definitely the wrong choice for
lots of reasons. for example, a 15 byte buffer would force copying in
order to get proper alignment for the hash calculation through all the
data on the 2nd 3rd and 4th blocks. no such problem exists with a 20
byte block size.

indeed, as a way of protecting servers, you might even mandate a block
size (for non trailing blocks) that is a multiple of something
comfortable for your hash algorithm. I don't see value in letting a
client pick something hard to work with when the multiple you will
choose (probably 4?) will have plenty of granularity.

-----

more related comments on the spec.. it took me a few minutes to figure
out that the if-block hashes are crc32's. the document just calls them
hashes or crcs. I had to go to the code to find out it was crc 32. So
that should really be documented. 

but then the doc says "The hashes are 60-bits number, meaning in the
if-block, each hash occupies 10 bytes",.. the if-block defines the
hashes as being base64'd.

I don't understand this all. Aren't crc32 hashes, well, 32 bits not 60?
and the base 64 length of 4 bytes is 8. .. even if they were 60 bits I
think it takes 12 bytes of b64 to represent it, so I don't see where the
10 comes from. (b64 takes 3 bytes and turns it into 4.. trailer is
padded out to a multiple of 3 with 0 bits.)

anyhow, assuming that they really are 32 bit values I would suggest you
represent them in the if-block header as 8 digit network order hex
strings which are the same length as the b64 encoding but a heck of a
lot more efficient to convert back to a real value than b64. maybe you
meant to send them as "0x12345678" which is indeed 10 bytes long but has
nothing to do with b64? (though I think we can forgo the 0x prefix and
save the bytes, personally.


HTH
-Patrick





More information about the Http-crcsync mailing list