[Http-crcsync] General comments on crcsync document

Wed Dec 30 03:59:36 EST 2009

Hi Guys,

Sorry to come back on this blocksize issue one more time but I finally 
understand the idea that Toby has implemented and it is a much simpler idea 
then we had previously.

The idea is to merge the trailing bytes into the last block.

Example: we have a file of 40k + 2 bytes file and the wish to use 40 blocks:

In the old approach, we would have 40 blocks of 1k and a trailing block of 2 
bytes, leading to 41 hashes in total.

With the new approach, we would have 39 blocks of 1k and a last block of 1k + 
2 bytes.

Or to express it more abstract (like Toby already did in the updated spec):

let FS be the filesize and N the number of blocks, then:
  normal_block_size = floor(FS/N)
  last_block_size = normal_block_size + FS mod N

The size normal_block_size will be used for block 1...N-1.
The size last_block_size will be used for block N.

So with this new approach, it is very easy to non-ambiguously determine the 
blocksizes from the filesize and the number of blocks/hashes. It is easy to 
express it in the specification, easy to implement and as an added bonus, the 
request and response will be slightly more compact when FS mod N != 0.

I propose to keep this new approach.

I'm also wondering: what is the current state of the firefox plugin?

The only important point not yet well implemented in the apache crcsync 
modules is the handling of etag (it is not yet implemented as previously 
discussed) and re-compressing the content by crcsync-client. I'll add this 
part to the code. 

Two other points that are not yet implemented but that are not a pre-requisite 
for broader testing are:
1) the idea that the client should use a blocksize that is a multiple of 
whatever the server prefers and has indicated in the Capability header
2) the idea that the client can use a similar page as basis for a delta when 
it does not yet have the requested page in it's cache.

Cheers,
Alex

Op maandag 28 december 2009, schreef Alex Wulms:
> One more point: the proposal in the July message was still talking about
> indicating the hash algorithm in the header but that was before we had the
> entire debate about the algorithm and settled for CRC64-ISO. I propose to
> remove that part from the headers.
>
> Example of request header from the client with filesize 45 bytes, blocksize
> of 20 bytes for the complete blocks (implying 2 complete and one trailer
> block) and csl stands for checksum list:
>
> If-Block: fs=45, bs=20, csl=aaaabbbbccccdddd
>
> Example of capability header from the server, prefering blocksize multiple
> of 8 due to hardware support:
>
> Capability: crcsync, m=8
>
> Cheers,
> Alex
>
> Op maandag 28 december 2009, schreef Alex Wulms:
> > All,
> >
> > I have been reading back the mail archive and realize that a better
> > solution was already proposed for this entire topic a while back. See
> > http://lists.laptop.org/pipermail/http-crcsync/2009-July/000144.html
> >
> > I'll update the doc and the code accordingly (I have some time on my hand
> > this week)
> >
> > Cheers,
> > Alex
> >
> > Op maandag 28 december 2009, schreef Alex Wulms:
> > > I have (finally) been reading up on the latest spec and understand now
> > > where the complication comes from.
> > >
> > > It is because the server, in the current proposal, has to guess the
> > > block-size that the client used from the filesize and the number of
> > > hashes specified in the request.
> > >
> > > If we want to use a hash for the trailing block, the logic to use the
> > > same block-size on client and server is a little bit tricky due to the
> > > fact that the trailing block can have a zero size, implying that the
> > > number of complete blocks is sometimes the same as the number of hashes
> > > and sometimes one block less.
> > >
> > > I don't know how to write in one mathematical formula how the client
> > > and server should behave but algorithm wise I would do it like
> > > illustrated in the below examples (inspired on Toby's example).
> > >
> > > Example 1:
> > >
> > > The client wants to make 40 full blocks while the file is 40k + 2 bytes
> > >
> > > Client will use (full-block-size = floor(filesize / #full-blocks) = 1k)
> > > bytes for the complete blocks
> > > And trailing block size will be (trailing-size = filesize %
> > > #full-blocks = 2 bytes)
> > >
> > > So this will give 40 1K blocks and one 2 bytes block. In total 41
> > > hashes in the request.
> > >
> > > Example 2:
> > >
> > > The client wants to make (again) 40 full blocks while the file is 40k
> > > bytes
> > >
> > > Client will use (full-block-size = floor(filesize / #full-blocks) = 1k)
> > > bytes for the complete blocks
> > > And trailing block size will be (trailing-size = filesize %
> > > #full-blocks = 0 bytes)
> > >
> > > So this will give 40 1K blocks but there will not be a trailing block.
> > > So in total, there will be 40 hashes in the request.
> > >
> > > In both examples, the server has to guess from the number of hashes and
> > > from the filesize if there is any trailing block or if there are only
> > > complete blocks.
> > >
> > > So the server would have to say something like
> > >
> > > if (filesize % #hashes == 0)
> > > {
> > >   full-block-size = filesize/#hashes;
> > >   n-complete-blocks = #hashes;
> > >   trailing-block-size = 0;
> > > }
> > > else
> > > {
> > >  full-block-size = floor(filesize / (#hashes-1));
> > >  n-complete-blocks = #hashes - 1;
> > >  trailing-block-size = filesize % (#hashes-1);
> > >  assert(trailing-block-size != 0); // client did something weird
> > > }
> > >
> > > This would cover both cases (with and without trailing block).
> > >
> > >
> > > The alternative like Toby said is indeed to only calculate hashes for
> > > complete blocks and treat the trailing block (if any) always as a
> > > literal block. In that case, the client should be carefull to only pass
> > > complete blocks to the crc-library and the server could simply use the
> > > logic 'blocksize=floor(filesize/$hashes)' and not worry about the
> > > trailing block. But given the fact that Rusty's CRC library properly
> > > supports a trailing block, I propose to use it, despite the fact that
> > > it will make the logic to determine the total number of hashes a little
> > > bit more complex.
> > >
> > >
> > > Cheers,
> > > Alex
> > >
> > > > If we use last_block_size = file % block_count the final block will
> > > > have a maximum size of block_count, so if we have 40 blocks for 39k +
> > > > 2 byte file then we have 39 1k blocks and a trailing block of size 2
> > > > bytes. Another option is simply to drop the trailing blocks and they
> > > > will always be returned as a literal.
> > > >
> > > > Feel free to correct my maths if I am missing something...
> > > >
> > > > Toby
> > > >
> > > > 2009/11/2 Rusty Russell <rusty at rustcorp.com.au>
> > > >
> > > > > On Thu, 29 Oct 2009 06:11:53 am Toby Collett wrote:
> > > > > > The current version in git now implements the standard document
> > > > >
> > > > > completely
> > > > >
> > > > > > as far as I am aware (doc is available from git
> > >
> > > http://repo.or.cz/w/httpd-crcsyncproxy.git?a=tree;f=crccache/doc;h=37d9
> > >0a cd 37bb0199a37e6d6a779c37c4f37da29b;hb=HEAD
> > >
> > > > > )
> > > > >
> > > > > > So now we need some testing, not sure the best way to do this,
> > > > > > Martin,
> > > > > >
> > >  > > > do
> > > > > >
> > > > > > you want to set up access to a server?
> > > > > >
> > > > > > Rusty: There was an assertion that tailsize be < block-size in
> > > > > > the crc
> > > > >
> > > > > code.
> > > > >
> > > > > > The latest version has tail_size = blocksize + remainder. It
> > > > > > seems to
> > > > >
> > > > > work
> > > > >
> > > > > > when that assertion is removed and I couldnt see any reason why
> > > > > > it can't
> > > > >
> > > > > be
> > > > >
> > > > > > greater in the current implementation. Could you confirm?
> > > > >
> > > > > There's no real reason, but it seems wrong.  if tailsize >
> > > > > blocksize, why isn't there simply one more block?
> > > > >
> > > > > Cheers,
> > > > > Rusty (who hasn't really been paying any attention)
> > >
> > > _______________________________________________
> > > Http-crcsync mailing list
> > > Http-crcsync at lists.laptop.org
> > > http://lists.laptop.org/listinfo/http-crcsync
> >
> > _______________________________________________
> > Http-crcsync mailing list
> > Http-crcsync at lists.laptop.org
> > http://lists.laptop.org/listinfo/http-crcsync
>
> _______________________________________________
> Http-crcsync mailing list
> Http-crcsync at lists.laptop.org
> http://lists.laptop.org/listinfo/http-crcsync