[Http-crcsync] General comments on crcsync document

Sun Jul 12 10:37:01 EDT 2009

Op vrijdag 10 juli 2009, schreef Patrick McManus:
> On Thu, 2009-07-09 at 23:31 +0200, Alex Wulms wrote:
> > Hi Patrick,
> >
> > Sorry, I just realise I got my formula's mixed-up. I was starting from
> > the situation that we know the filesize and the blocksize but in reality
> > we know the filesize and the number of blocks, including the trailing
> > block.
>
> well that's what I'm getting at. you're going come up with some rule
> that basically says each non trailing block must be equal sized and as
> large as possible so that the trailing chunk is smaller than the number
> of blocks in the set. right?
>
> It is deterministic, true. But un-necessarily complex.
>
> Why not just send both the file and block sizes in the if-block header?
> That's certainly a more compact representation and then if any firewall
> strips unrecognized request headers on you (which they will) at least it
> is an all or nothing proposition.
>
> If-Block: fs=45, bs=20, crc1, crc2, crc3
Good idea. In the prototype code we were using a fixed number of complete 
blocks and one optional trailer block. So in that implementation, the 
blocksize was derived from the filesize (e.g. blocksize = 
filesize/number-of-complete-block). I was still thinking that if we go for a 
flexible number of blocks, that we should still derive the blocksize from the 
filesize.
However, now that the number of blocks is flexible we can indeed drive it also 
from the other direction by having the client magically decide on a blocksize 
and then we can derive the number of blocks from that blocksize. This idea is 
fully supported by your proposal.


> But from an efficiency standpoint its definitely the wrong choice for
> lots of reasons. for example, a 15 byte buffer would force copying in
> order to get proper alignment for the hash calculation through all the
> data on the 2nd 3rd and 4th blocks. no such problem exists with a 20
> byte block size.
>
> indeed, as a way of protecting servers, you might even mandate a block
> size (for non trailing blocks) that is a multiple of something
> comfortable for your hash algorithm. I don't see value in letting a
> client pick something hard to work with when the multiple you will
> choose (probably 4?) will have plenty of granularity.
I'm not familiar enough with the crcsync library to judge if the byte 
alignment has any impact on the performance. Does this not also depend on the 
processor architecture and other hardware that might be used to accelerate 
the crc calculation? Should we make such hardware dependent optimizations 
part of the specification? What would in practice be a good multiple if we 
would settle for the proposed 60-bit crc, knowing that 64-bit CPUs are more 
and more becoming mainstream and that it is just a matter of time before the 
first 128-bit CPU's will hit the street?


Cheers,
Alex