[Http-crcsync] General comments on crcsync document

Mon Dec 28 09:40:52 EST 2009

I have (finally) been reading up on the latest spec and understand now where 
the complication comes from. 

It is because the server, in the current proposal, has to guess the block-size 
that the client used from the filesize and the number of hashes specified in 
the request.

If we want to use a hash for the trailing block, the logic to use the same 
block-size on client and server is a little bit tricky due to the fact that 
the trailing block can have a zero size, implying that the number of complete 
blocks is sometimes the same as the number of hashes and sometimes one block 
less.

I don't know how to write in one mathematical formula how the client and 
server should behave but algorithm wise I would do it like illustrated in the 
below examples (inspired on Toby's example).

Example 1:

The client wants to make 40 full blocks while the file is 40k + 2 bytes

Client will use (full-block-size = floor(filesize / #full-blocks) = 1k) bytes 
for the complete blocks
And trailing block size will be (trailing-size = filesize % #full-blocks = 2 
bytes)

So this will give 40 1K blocks and one 2 bytes block. In total 41 hashes in 
the request.

Example 2:

The client wants to make (again) 40 full blocks while the file is 40k bytes

Client will use (full-block-size = floor(filesize / #full-blocks) = 1k) bytes 
for the complete blocks
And trailing block size will be (trailing-size = filesize % #full-blocks = 0 
bytes)

So this will give 40 1K blocks but there will not be a trailing block. So in 
total, there will be 40 hashes in the request.

In both examples, the server has to guess from the number of hashes and from 
the filesize if there is any trailing block or if there are only complete 
blocks.

So the server would have to say something like

if (filesize % #hashes == 0)
{
  full-block-size = filesize/#hashes;
  n-complete-blocks = #hashes;
  trailing-block-size = 0;
}
else
{
 full-block-size = floor(filesize / (#hashes-1));
 n-complete-blocks = #hashes - 1;
 trailing-block-size = filesize % (#hashes-1);
 assert(trailing-block-size != 0); // client did something weird
}

This would cover both cases (with and without trailing block).

The alternative like Toby said is indeed to only calculate hashes for complete 
blocks and treat the trailing block (if any) always as a literal block. In 
that case, the client should be carefull to only pass complete blocks to the 
crc-library and the server could simply use the 
logic 'blocksize=floor(filesize/$hashes)' and not worry about the trailing 
block. But given the fact that Rusty's CRC library properly supports a 
trailing block, I propose to use it, despite the fact that it will make the 
logic to determine the total number of hashes a little bit more complex.

Cheers,
Alex

> If we use last_block_size = file % block_count the final block will have a
> maximum size of block_count, so if we have 40 blocks for 39k + 2 byte file
> then we have 39 1k blocks and a trailing block of size 2 bytes. Another
> option is simply to drop the trailing blocks and they will always be
> returned as a literal.
> 
> Feel free to correct my maths if I am missing something...
> 
> Toby
> 
> 2009/11/2 Rusty Russell <rusty at rustcorp.com.au>
> 
> > On Thu, 29 Oct 2009 06:11:53 am Toby Collett wrote:
> > > The current version in git now implements the standard document
> > completely
> > > as far as I am aware (doc is available from git
> > >
> > > 
http://repo.or.cz/w/httpd-crcsyncproxy.git?a=tree;f=crccache/doc;h=37d90acd37bb0199a37e6d6a779c37c4f37da29b;hb=HEAD
> > )
> > >
> > > So now we need some testing, not sure the best way to do this, Martin,
 > > > do 
> > > you want to set up access to a server?
> > >
> > > Rusty: There was an assertion that tailsize be < block-size in the crc
> > code.
> > > The latest version has tail_size = blocksize + remainder. It seems to
> > work
> > > when that assertion is removed and I couldnt see any reason why it can't
> > be
> > > greater in the current implementation. Could you confirm?
> >
> > There's no real reason, but it seems wrong.  if tailsize > blocksize, why
> > isn't there simply one more block?
> >
> > Cheers,
> > Rusty (who hasn't really been paying any attention)
>