[Http-crcsync] Apache proxy CRCsync

Alex Wulms alex.wulms at scarlet.be
Mon Mar 23 17:05:57 EDT 2009


Hi Toby,

I did not have much time last week to work on the project but want to continue 
again this week. I have been thinking about the integration between the 
standard cache module of apache and the crccache_client cache-handler module.

At this moment, the cache module unfortunately does not invoke crccache_client 
for most dynamic pages; the web/application servers indicate that those pages 
are not cacheable, either by setting an expiry time in the past or by setting 
appropriate cache-control headers or a combination of the two. And the cache 
module respects that, as a good http-citizen. But the whole idea of crccache 
is that those pages should be stored by crccache_client anyway but get 
refetched and then delta'd by the crccache_client/crccache_server chain on 
the next request.
So one way or another crccache_client/crccache_server should trick the cache 
module into caching those dynamic pages.

I see two potential ways to make this happen:

Option 1)
crccache_client (or crccache_server?) modifies the cache related response 
headers before returning the response back to the cache module. It would 
modify the headers in such a way that the cache module would decide that the 
page must be cached but revalidated at the next request. This would require 
no modifications to the cache module but I do consider it a not-so-clean 
hack, because we would have to reverse engineer the cache module to 
understand when to modify the headers and when not to modify the headers. 
Which is obviously fragile because future enhancements to the cache module 
could potentially break such logic.

Option 2)
We introduce some new header(s) that crccache can inject in the response to 
indicate to the cache module that the pages will be cached by a delta/crcsync 
aware cache handler. And then we adapt the cache module itself to understand 
this new header and to cache normally-not-cachable pages if this header is 
present (and send them for revalidation to the crccache handler upon next 
request). I see this as a cleaner solution. Though, before immediately 
starting to implement this solution, I believe that it should be analysed 
into a little bit more detail. Especially with respect to the future, when 
crccache will talk to some servers that are crcsync aware and can directly 
handle the encoding themselves while crccache will at the same time also 
still talk to many current-gen servers that do not know this http extension.

What are your thoughts on this subject?


Thanks and kind regards,
Alex





Op maandag 16 maart 2009, schreef Toby Collett:
> Great to hear you got it running, unfortunately I only have about a two
> week head start on you with regard to the apache front, so I am sure lots
> of things will get neater as we go along.
>
> 2009/3/16 Alex Wulms <alex.wulms at scarlet.be>
>
> > Hi Toby,
> >
> > I managed to get it working on my PC under suse 11.1 with apache 2.2.10.
> >
> > When I configured a dedicated debug log per virtual server, I noticed
> > that the
> > crccache_client and crccache_server modules were both invoked in both
> > virtual
> > servers. Judging from the error log you sent me, that is also the case on
> > your server.
> >
> > I have made following changes to fix the situation:
> >
> > 1) Move the 'CacheEnable crccache_client' directive (for the 'default'
> > virtual
> > server) inside the <VirtualHost> tag. Apparently it is applied globally
> > as long as it is outside the <VirtualHost> tag, regardless of the config
> > file in
> > which it appears.
>
> Seems like a sensible change.
>
> > 2) Introduce a new directive 'CRCCacheServer on'.
> > This directive is checked by mod_crccache_server in the
> > crccache_server_header_parser_handler.
> > It is specified in the <VirtualHost> tag of the upstream_proxy of the
> > virtual
> > server.
> > Apparently modules get loaded globally and functions like
> > the ..._header_parser_handler get invoked for each virtual server, so
> > they must check themselves if they should be enabled or disabled in a
> > given virtual server. I found this through google, which pointed me to a
> > forum where somebody else had faced a similar problem.
>
> Makes sense
>
> > I also realize why I only found cached files
> > under /var/cache/apache2/mod_crccache_server and not under ..._client.
> > It is because the crccache_client.conf and crccache_server.conf file both
> > use
> > the parameter CacheRoot to store the cache directory. These parameters
> > are apparently also global. The fact that they are in two different
> > config files
> > does not automagically store them in a module specific namespace. So I
> > have renamed the parameters to differentiate between the client and the
> > server module.
>
> Actually only the client end should need the CacheRoot at all, the server
> side doesnt need caching at all. You could configure a standard apache
> cache if you wanted, but it probably wont gain much.
>
> > I have also noticed that, although the server module reads these
> > parameters,
> > they actually don't get used by the current code. Are they there due to
> > copy&paste reasons or are they already there for future enhancements, in
> > order to cache stuff temporary on the server side?
>
> Just copy and paste I guess, I think I left them there so I can something
> to base other parameters on if we need them server side.
>
> > I have pushed my changes to the repository. Please review them. I'm still
> > new
> > to Apache development so I might have misinterpreted some things.
> >
> > Thanks and kind regards,
> > Alex




More information about the Http-crcsync mailing list