[Http-crcsync] Progress on implementing CRCSync client logic to base encoding on 'similar' pages in the cache

Alex Wulms alex.wulms at scarlet.be
Thu Aug 5 04:59:23 EDT 2010


I have checked-in my changes to the git repository

The 'similar page cache' info kept in memory is now dynamically
maintained. I have also further fine-tuned the logic to find the most
appropriate similar page, taking into consideration the 'vary' header
and the 'accept' versus 'content-type' header.

Regarding the vary header: it makes for example no sense to use a page
in French as basis for a request that asks for a page in Dutch through
the 'language' header, even when the pages are similar as far as the URI
is concerned. So the algorithm now considers the vary header in the
cached pages and if present, only selects candidate pages with the
corresponding request headers.

Regarding the accept/content-type header: it makes for example no sense
to use an image/gif file as template for a text/html page or vice versa.
So this is something the algorithm tries to avoid based on the accept
header of the request and the content-type header of the cached pages.
Furthermore, I have updated the server part (which sets the
'crcsync-similar' header) so that you can configure setting of the
header only for specific mime-types. The administrator of a website (or
a crcsync-server-proxy) can thus set 'similar-page' regular expressions
per mime-type, depending on the site structure, so that the header is
only returned for mime-types where it makes sense to apply a 'similar
page' logic.

The algorithm to find the most appropriate similar page can still be
further fine-tuned but it is good enough for a first test in the field,
which brings me to the next subject:

How do we plan to start piloting this module in the field? At a school
for example that participates in OLPC, that has some technically savy
staff and that is willing to spent the time to provide us with valuable
feedback and work with us on further fine-tuning everything.

Kind regards,

More information about the Http-crcsync mailing list