[Http-crcsync] Apache proxy CRCsync & mozilla gsoc project?

WULMS Alexander Alex.WULMS at swift.com
Thu Apr 2 04:39:43 EDT 2009


I just realize that the concurrency problem at the cache-client side exists anyway with dynamic pages. A dynamic page can be
different for each request. Even if the get request handling at the server side is implemented idem-potent as per http convention,
the page might still change on a next request due to server-side background activities.

So imagine following scenario:

T1: User U1 issues request R1 for resource x
T2: cache-client reads page, calculates checksum, sends request R1 to cache-server
T3: cache-server forwards request R1 to upstream server
T4: User U2 issues request R2 for resource x (so for the very same resource)
T5: cache-client reads page, calculates checksum, sends request R2 to cache-server
T6: cache-server forwards request R2 to upstream server
T7: server spawns a thread to handle R1. This thread does not get scheduled immediately.
T8: server spawns a thread to handle R2. This thread is scheduled immediately.
T9: server finished R2 and returns it to cache-server
T10: cache server returns a crccache response to cache-client for R2. Block 1,2,3 are unmodified, block 4,5,6 are changed.
T11: cache-client constructs a response to user U2 and updates block 4,5,6 on the cached-file on disk...
T12: server schedules thread for request R1, finishes processing it and returns it to cache-server
T13: cache server returns a crccache response to cache-client for R1. Block 1,2 are changed, block 3,4,5 are changed, block 6 is
unmodified
T14: cache-client constructs an invalid response to user U1. Cache-client will read amongst others block 6 from the cached-file on
disk, which just got modified by request R2
T15: Assuming we have implemented the global checksum safeguard: cache-client sends an error to user U1. Assuming no global checksum
safeguard: cache-client returns a corrupted page to user U1

So the question is: how big is the chance on above scenario? I suppose it all depends on the load on the crccache-proxy and on the
nature of the changes on the dynamic site(s).

This requires some further thinking and analysis.



--------
Alex WULMS
Lead Developer/Systems Engineer

Tel: +32 2 655 3931
Information Systems - SWIFT.COM Development
S.W.I.F.T. SCRL


>-----Original Message-----
>From: Martin Langhoff [mailto:martin.langhoff at gmail.com]
>Sent: Thursday, April 02, 2009 10:01 AM
>To: WULMS Alexander
>Cc: Gervase Markham; tridge at samba.org; angxia Huang; jg at freedesktop.org; http-crcsync at lists.laptop.org
>Subject: Re: Apache proxy CRCsync & mozilla gsoc project?
>
>On Thu, Apr 2, 2009 at 9:44 AM, WULMS Alexander <Alex.WULMS at swift.com> wrote:
>>>
>>>If the cache just blows away the cached base page it was using when one of
>>>these errors occurs, that should Do The Right Thing even without seed.
>> I see some concurrency issues and race conditions if the cache would simply blow away the cached base page, just
>in case that two
>> different concurrent requests are using the same base page and only one of them suffers from the checksum clash.
>
>Is it really a problem? Assuming no seed, the requests are idempotent,
>so of one clashes and the other one doesn't, the successful one
>replaces the local cache entry with the latest successfully downloaded
>page. So it is actually 'reseeding' any requests that follow, as the
>cached document has changed.
>
>Thinking in the opposite direction, will there be actual usable
>opportunities for the upstream proxy to save cpu time or bandwidth
>with content that doesn't change? The "win" scenario for avoiding a
>random seed is:
>
> - The origin server didn't mark this as cacheable.
> - The upstream proxy can cache the files it serves, plus their
>already computed hash.
> - On a request for an already-cached-and-hashed file, the upstream
>proxy could avoid re-computing the hash by comparing the (de-chunked)
>data stream to the file on disk.
>
>So the tradeoff is of increased disk storage and IO during the content
>serving to save CPU time. This implies some assumptions on the
>relative costs of IO and cpu time...
>
>cheers,
>
>
>martin
>btw - pruned the CC's to stop mailman from complaining of too many recipients...
>--
> martin.langhoff at gmail.com
> martin at laptop.org -- School Server Architect
> - ask interesting questions
> - don't get distracted with shiny stuff  - working code first
> - http://wiki.laptop.org/go/User:Martinlanghoff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5020 bytes
Desc: not available
Url : http://lists.laptop.org/pipermail/http-crcsync/attachments/20090402/5388a7a5/attachment-0001.bin 


More information about the Http-crcsync mailing list