[Http-crcsync] General comments on crcsync document
Alex Wulms
alex.wulms at scarlet.be
Sun Jul 12 10:37:01 EDT 2009
Op vrijdag 10 juli 2009, schreef Patrick McManus:
> On Thu, 2009-07-09 at 23:31 +0200, Alex Wulms wrote:
> > Hi Patrick,
> >
> > Sorry, I just realise I got my formula's mixed-up. I was starting from
> > the situation that we know the filesize and the blocksize but in reality
> > we know the filesize and the number of blocks, including the trailing
> > block.
>
> well that's what I'm getting at. you're going come up with some rule
> that basically says each non trailing block must be equal sized and as
> large as possible so that the trailing chunk is smaller than the number
> of blocks in the set. right?
>
> It is deterministic, true. But un-necessarily complex.
>
> Why not just send both the file and block sizes in the if-block header?
> That's certainly a more compact representation and then if any firewall
> strips unrecognized request headers on you (which they will) at least it
> is an all or nothing proposition.
>
> If-Block: fs=45, bs=20, crc1, crc2, crc3
Good idea. In the prototype code we were using a fixed number of complete
blocks and one optional trailer block. So in that implementation, the
blocksize was derived from the filesize (e.g. blocksize =
filesize/number-of-complete-block). I was still thinking that if we go for a
flexible number of blocks, that we should still derive the blocksize from the
filesize.
However, now that the number of blocks is flexible we can indeed drive it also
from the other direction by having the client magically decide on a blocksize
and then we can derive the number of blocks from that blocksize. This idea is
fully supported by your proposal.
> But from an efficiency standpoint its definitely the wrong choice for
> lots of reasons. for example, a 15 byte buffer would force copying in
> order to get proper alignment for the hash calculation through all the
> data on the 2nd 3rd and 4th blocks. no such problem exists with a 20
> byte block size.
>
> indeed, as a way of protecting servers, you might even mandate a block
> size (for non trailing blocks) that is a multiple of something
> comfortable for your hash algorithm. I don't see value in letting a
> client pick something hard to work with when the multiple you will
> choose (probably 4?) will have plenty of granularity.
I'm not familiar enough with the crcsync library to judge if the byte
alignment has any impact on the performance. Does this not also depend on the
processor architecture and other hardware that might be used to accelerate
the crc calculation? Should we make such hardware dependent optimizations
part of the specification? What would in practice be a good multiple if we
would settle for the proposed 60-bit crc, knowing that 64-bit CPUs are more
and more becoming mainstream and that it is just a matter of time before the
first 128-bit CPU's will hit the street?
Cheers,
Alex
More information about the Http-crcsync
mailing list