[Http-crcsync] crccache ready for some testing I think

Alex Wulms alex.wulms at scarlet.be
Mon Mar 30 19:05:33 EDT 2009


Hi,

I have tested the cache-server with the new changes and it still did not work 
well as soon as a mis-match was found; after that, no further matches were 
found. 
I have now refactored the cache-server to work with a moving window that 
properly feeds the data into the crc_read_block function and tested it 
against slashdot. Now it works fine. See the check-in comments and the 
comments in the C code for further details.

I have been testing against slashdot (their home page) because they have some 
advertising block early in their page (somewhere in the first 10kB, total 
page size is approx 100kB) that changes regularly while the rest of the home 
page is more or less stable. So it is an ideal test case for this situation.

With the old code, the compressed size was 101% when the advertising part 
changed (it got blown up a little bit due to block-headers overhead).
With the new code, the compressed size is 5%.

Ps: slashdot sets 'cache private' headers so normally cache module does not 
want to cache them. I have overruled it via the configuration parameters but 
don't want to submit those changes yet, as I'm still fine-tuning. Don't want 
to commit test changes that might open the cache too much for sensitive data.

Thanks and brs,
Alex


Op maandag 30 maart 2009, schreef Toby Collett:
> I have pulled these changes into the git repo.
>
> Thanks,
> Toby
>
> 2009/3/30 Rusty Russell <rusty at rustcorp.com.au>
>
> > On Monday 30 March 2009 07:31:38 Alex Wulms wrote:
> > > I have also investigated why the cache-server stops finding block
> > > matches after the first difference. I think it is because at this
> > > moment, the
> >
> > cache
> >
> > > server does not read ahead enough in the data-stream. Basically, the
> > > crc_read_block function of Rusty must be fed with as much data as
> >
> > possible,
> >
> > Bug fixed.  Grab latest from ccan, or just apply this patch.
> >
> > Thanks for this: I added tests for feeding 1 byte at a time, and it
> > showed the
> > issue immediately.
> >
> > Cheers,
> > Rusty.
> >
> > === modified file 'ccan/crcsync/crcsync.c'
> > --- ccan/crcsync/crcsync.c      2009-02-17 09:16:33 +0000
> > +++ ccan/crcsync/crcsync.c      2009-03-30 00:07:04 +0000
> > @@ -138,6 +138,7 @@
> >                goto have_match;
> >        }
> >
> > +       /* old is the trailing edge of the checksum window. */
> >        if (buffer_size(ctx) >= ctx->block_size)
> >                old = ctx->buffer + ctx->buffer_start;
> >        else
> > @@ -153,6 +154,7 @@
> >                                                    *old, *p,
> >                                                    ctx->uncrc_tab);
> >                        old++;
> > +                       /* End of stored buffer?  Start on data they gave
> > us. */
> >                        if (old == ctx->buffer + ctx->buffer_end)
> >                                old = buf;
> >                } else {
> > @@ -167,11 +169,6 @@
> >                p++;
> >        }
> >
> > -       /* Make sure we have a copy of the last block_size bytes.
> > -        * First, copy down the old data.  */
> > -       if (buffer_size(ctx)) {
> > -       }
> > -
> >        if (crcmatch >= 0) {
> >                /* We have a match! */
> >                if (ctx->literal_bytes > ctx->block_size) {
> > @@ -187,12 +184,15 @@
> >                        assert(ctx->literal_bytes == 0);
> >                        ctx->have_match = -1;
> >                        ctx->running_crc = 0;
> > +                       /* Nothing more in the buffer. */
> > +                       ctx->buffer_start = ctx->buffer_end = 0;
> >                }
> >        } else {
> >                /* Output literal if it's more than 1 block ago. */
> >                if (ctx->literal_bytes > ctx->block_size) {
> >                        *result = ctx->literal_bytes - ctx->block_size;
> > -                       ctx->literal_bytes = ctx->block_size;
> > +                       ctx->literal_bytes -= *result;
> > +                       ctx->buffer_start += *result;
> >                } else
> >                        *result = 0;
> >
> > @@ -243,29 +243,34 @@
> >  long crc_read_flush(struct crc_context *ctx)
> >  {
> >        long ret;
> > +       size_t final;
> >
> > -       /* In case we ended on a whole block match. */
> > -       if (ctx->have_match == -1) {
> > -               size_t final;
> > -
> > -               final = final_block_match(ctx);
> > -               if (!final) {
> > -                       /* This is how many bytes we're about to consume.
> > */
> > -                       ret = buffer_size(ctx);
> > -                       ctx->buffer_start += ret;
> > -                       ctx->literal_bytes -= ret;
> > -
> > -                       return ret;
> > -               }
> > -               ctx->buffer_start += final;
> > -               ctx->literal_bytes -= final;
> > -               ctx->have_match = ctx->num_crcs-1;
> > -       }
> > -
> > -       /* It might be a partial block match, so no assert */
> > -       ctx->literal_bytes = 0;
> > -       ret = -ctx->have_match-1;
> > -       ctx->have_match = -1;
> > +       /* We might have ended right on a matched block. */
> > +       if (ctx->have_match != -1) {
> > +               ctx->literal_bytes -= ctx->block_size;
> > +               assert(ctx->literal_bytes == 0);
> > +               ret = -ctx->have_match-1;
> > +               ctx->have_match = -1;
> > +               ctx->running_crc = 0;
> > +               /* Nothing more in the buffer. */
> > +               ctx->buffer_start = ctx->buffer_end;
> > +               return ret;
> > +       }
> > +
> > +       /* Look for truncated final block. */
> > +       final = final_block_match(ctx);
> > +       if (!final) {
> > +               /* Nope?  Just a literal. */
> > +               ret = buffer_size(ctx);
> > +               ctx->buffer_start += ret;
> > +               ctx->literal_bytes -= ret;
> > +               return ret;
> > +       }
> > +
> > +       /* We matched (some of) what's left. */
> > +       ret = -(ctx->num_crcs-1)-1;
> > +       ctx->buffer_start += final;
> > +       ctx->literal_bytes -= final;
> >        return ret;
> >  }




More information about the Http-crcsync mailing list