[Http-crcsync] Apache proxy CRCsync
Alex Wulms
alex.wulms at scarlet.be
Mon Mar 23 17:05:57 EDT 2009
Hi Toby,
I did not have much time last week to work on the project but want to continue
again this week. I have been thinking about the integration between the
standard cache module of apache and the crccache_client cache-handler module.
At this moment, the cache module unfortunately does not invoke crccache_client
for most dynamic pages; the web/application servers indicate that those pages
are not cacheable, either by setting an expiry time in the past or by setting
appropriate cache-control headers or a combination of the two. And the cache
module respects that, as a good http-citizen. But the whole idea of crccache
is that those pages should be stored by crccache_client anyway but get
refetched and then delta'd by the crccache_client/crccache_server chain on
the next request.
So one way or another crccache_client/crccache_server should trick the cache
module into caching those dynamic pages.
I see two potential ways to make this happen:
Option 1)
crccache_client (or crccache_server?) modifies the cache related response
headers before returning the response back to the cache module. It would
modify the headers in such a way that the cache module would decide that the
page must be cached but revalidated at the next request. This would require
no modifications to the cache module but I do consider it a not-so-clean
hack, because we would have to reverse engineer the cache module to
understand when to modify the headers and when not to modify the headers.
Which is obviously fragile because future enhancements to the cache module
could potentially break such logic.
Option 2)
We introduce some new header(s) that crccache can inject in the response to
indicate to the cache module that the pages will be cached by a delta/crcsync
aware cache handler. And then we adapt the cache module itself to understand
this new header and to cache normally-not-cachable pages if this header is
present (and send them for revalidation to the crccache handler upon next
request). I see this as a cleaner solution. Though, before immediately
starting to implement this solution, I believe that it should be analysed
into a little bit more detail. Especially with respect to the future, when
crccache will talk to some servers that are crcsync aware and can directly
handle the encoding themselves while crccache will at the same time also
still talk to many current-gen servers that do not know this http extension.
What are your thoughts on this subject?
Thanks and kind regards,
Alex
Op maandag 16 maart 2009, schreef Toby Collett:
> Great to hear you got it running, unfortunately I only have about a two
> week head start on you with regard to the apache front, so I am sure lots
> of things will get neater as we go along.
>
> 2009/3/16 Alex Wulms <alex.wulms at scarlet.be>
>
> > Hi Toby,
> >
> > I managed to get it working on my PC under suse 11.1 with apache 2.2.10.
> >
> > When I configured a dedicated debug log per virtual server, I noticed
> > that the
> > crccache_client and crccache_server modules were both invoked in both
> > virtual
> > servers. Judging from the error log you sent me, that is also the case on
> > your server.
> >
> > I have made following changes to fix the situation:
> >
> > 1) Move the 'CacheEnable crccache_client' directive (for the 'default'
> > virtual
> > server) inside the <VirtualHost> tag. Apparently it is applied globally
> > as long as it is outside the <VirtualHost> tag, regardless of the config
> > file in
> > which it appears.
>
> Seems like a sensible change.
>
> > 2) Introduce a new directive 'CRCCacheServer on'.
> > This directive is checked by mod_crccache_server in the
> > crccache_server_header_parser_handler.
> > It is specified in the <VirtualHost> tag of the upstream_proxy of the
> > virtual
> > server.
> > Apparently modules get loaded globally and functions like
> > the ..._header_parser_handler get invoked for each virtual server, so
> > they must check themselves if they should be enabled or disabled in a
> > given virtual server. I found this through google, which pointed me to a
> > forum where somebody else had faced a similar problem.
>
> Makes sense
>
> > I also realize why I only found cached files
> > under /var/cache/apache2/mod_crccache_server and not under ..._client.
> > It is because the crccache_client.conf and crccache_server.conf file both
> > use
> > the parameter CacheRoot to store the cache directory. These parameters
> > are apparently also global. The fact that they are in two different
> > config files
> > does not automagically store them in a module specific namespace. So I
> > have renamed the parameters to differentiate between the client and the
> > server module.
>
> Actually only the client end should need the CacheRoot at all, the server
> side doesnt need caching at all. You could configure a standard apache
> cache if you wanted, but it probably wont gain much.
>
> > I have also noticed that, although the server module reads these
> > parameters,
> > they actually don't get used by the current code. Are they there due to
> > copy&paste reasons or are they already there for future enhancements, in
> > order to cache stuff temporary on the server side?
>
> Just copy and paste I guess, I think I left them there so I can something
> to base other parameters on if we need them server side.
>
> > I have pushed my changes to the repository. Please review them. I'm still
> > new
> > to Apache development so I might have misinterpreted some things.
> >
> > Thanks and kind regards,
> > Alex
More information about the Http-crcsync
mailing list