[Http-crcsync] General comments on crcsync document

Wed Jul 8 21:57:13 EDT 2009

On Wed, 2009-07-08 at 16:33 -0700, Pedro R wrote:
> 
> 
> Yes, but you are considering a client which only speaks with crcsync aware server.

Hi Pedro and CRCSYNC team!

I need to object to that characterization - that is not what I am
considering.

>By using the OPTIONS method, the non-crcsync server may not respond
>properly.

which tells you what you need to know, right? (that its a non-crcsync
resource.)

I am simply saying that if you want to use HTTP to probe resources
regarding their implementations of extensions, crcsync is an extension
afterall, then HTTP provides OPTIONS as the proscribed framework for
doing so. In the best possible world this should just be done with
something analagous to accept-ranges (e.g. "accept-crcsync: v1" ?) which
can be put into the OPTIONS and normal GET responses even when deltas
are not applied (see below) as a sort of advertisement.

It is certainly a good thing that there is no *requirement* in the spec
to probe in order to create a backwards compatible request - but you
started this thread concerned about the overhead (and therefore
optimization) of including if-block when speaking to resources that did
not have this extension implemented. And that's a reasonable
implementation strategy imo even if its out of scope for the spec.

IMHO you really want to separate the concept of "is this implemented"
from "is this applied on this transaction".. the former is more or less
an immutable property while the latter might very well depend on some
rather immediate conditions that shouldn't really be cached. 

There are a number of good reasons a capable server might not want to
generate a 226 delta at any given time even though it could - and it
should certainly not be required to do so under any circumstance by the
presence of an If-Block on any particular request. (and because it isn't
required to, it is not a useful probing and caching technique). The
chief scenario in my mind is when it is going to send the "instance
coded" delta anyhow because there is not a useful match in the request..
instead of tunneling that through any intermediary with a 226 attached
it should really send it as a 200 so that it can be properly cached and
understood along the way. basically the same thing goes if the delta
being generated does not contain any literal blocks - a 304 would also
be a legitimate (and in my mind preferable in most cases) response. It
isn't really the place of the spec to mandate either the delta or the
traditional response, just to specify when they are legal and what they
mean. Think of it kind of like chunked encodings.. for a server to
generate one it needs to know that the client is capable of
understanding the response - but even then whether or not to delimit any
particular message as chunked is an implementation decision.

Delving into a related topic - I'm not so sure that disabling one
transfer optimization (caching) in order to support another (deltas) is
the way to go. I came to this particular party a little late, can anyone
explain to me why this "226" response is used instead of the
Accept-Encoding/Content-Encoding/Vary (If-Block, A-IM) triumvirate which
is meant to give fine grained cache control. This is really all just a
variation on etags and i-m afterall..
crccache_doc_http_crcsync_protocol.odt (am I reading the right
document?) seems to assert that the need to disable caching is a fait
accompli - but that isn't obvious to me at all. It frankly seems more
like an implementation simplification in order to get some code up and
running (with which I keenly sympathize) but that's not going to fly in
the http standardization world.

Relatedly, mandating 3 variations of "cache-control: no-cache" on every
response is never going to survive standardization. Is there some
inherent reason that deltas cannot coexist in a heirarchical cache
environment but ranges (to pick an example) can?

We all know that firewalls like to prevent extensions of protocols, even
in ways protocols were meant to grow. For HTTP that is especially true
of response codes. I'd bet money you will have more interop problems
inventing 226 responses (a very uncommon thing to do) than you will if
you operate in the existing response code framework. Even inventing new
headers will cause some problems, but generally the applications level
firewalls will just strip them from your requests which will still leave
a working (if non delta'd) application.. where an unknown response code
will probably result in the whole response being blocked (because the
firewall cannot semantically make sense of it - and thats its job.)

my nickel contribution.

Meanwhile, I admit that tonight is the first time I have read the above
mentioned doc in any real detail.

I have some additional questions about it, that hopefully are kind of
naive. Can anyone help me out?

1] What's the point of the A-IM request header? Does it serve a purpose
that If-Block does not?

2] Why is <number-of-blocks> in If-Block a bit shift value instead of an
integer? Is there some reason to prevent hash sets of sizes that aren't
powers of 2?

3] Why does <number-of-blocks> exist at all? HTTP is an ascii based
protocol.. the normal way would be to use the usual comma and whitespace
rules and just list the hashes and terminate them with CRLF. Putting the
count in there as a leader just forces some poor client to buffer.

4] doesn't the server need to know the block length the client used to
calculate the hashes it sent in if-block? Otherwise how can it know it
matches?

5] what's the point of the file-size request header? Or is it really the
block-size I was getting at in #4?

6] the block match uses a single byte binary block id.. block id isn't
defined anywhere - I assume it is the index of the offered hash in the
request if-block? Starting at 0 for the first one? Would be good to say.

7] if the block match uses an 8 bit (single byte) ID how come up to 512
blocks are allowed in an if-block ?

8] I think literal data blocks (both header and body) should have
options for uncompressed data with a binary length indicator. Certainly
not everything zlib's well.

Hope this helps,

-Patrick