[Mod_gzip] Proxy caching of mod_gzip compressed contents using "Vary:"
headers
mod_gzip@lists.over.net
mod_gzip@lists.over.net
Wed, 27 Feb 2002 23:33:47 +0200
Hi folks,
this time it's me to ask some questions.
Sit back, relax - I am going to take you down deep into the
abyss of HTTP and proxy caching of negotiated content ...
I am playing around with a caching proxy server delivering
mod_gzip compressed pages. And according to its own documen-
tation it is a HTTP/1.0 proxy (but see below!). Maybe I rather
shouldn't just do that, but I just wanted to learn how it's
intended to work, basically. Also, I have more than 30% of
HTTP requests that could easily be cached by a proxy (all
those GIFs and CSS and JavaScript files which are all served
uncompressed anyway, due to Netscape4 bugs), so I am trying
to find out how to keep these away from my Apache machine.
I have basically managed to get the proxy caching what it is
entitled to (my dynamic pages are sending "Cache-Control: no-cache"
as well as an expiration date in the past) and forwarding the
rest to the Apache server, at least for cache validation.
Everything is working fine - unless compression comes into the
game.
So this is my test scenario:
1. I request a page named "index.html" from the server, using
some browser that is explicitly asking for compressed content.
Everything is working as expected:
a) Apache is serving a compressed page;
b) this page is stored inside the proxy cache.
2. I request the same page using the same browser.
Again, it works: The proxy delivers the compressed content
without even asking Apache for revalidating it.
Generally, you _can_ cache compressed content; it's rather
the question whether you _should_ do it.
And now the whole thing is going to become ugly.
3. I use some M$IE 6.0 in "HTTP/1.0 mode" to request the same
document via the same proxy.
We all know that M$IE _can_ understand compressed content;
we also know that it will send the "Accept-Encoding" header
if and only if using HTTP/1.1, which I have now explicitly
told it _not_ to do. So what do we expect to happen?
3.a) M$IE is requesting the page without an "Accept-Encoding"
header.
3.b) The proxy server (IPlanet 3.6, the former Netscape proxy)
is forwarding this request to the Apache server as "con-
ditional get".
It does _not_ just serve it from the proxy cache without
checking, but had better not asked at all. Unfortunately,
the proxy had no clue about the content to be the result
of a negotiation process, as Apache has not sent a "Vary:"
header - or has it? (See below for this aspect.)
3.c) Apache replies "HTTP status 304". What else should it do?
After all, Apache doesn't know which version is stored
inside the proxy cache. (I am not sure whether mod_gzip
will even get to know about this event - at least it will
find out that it was no 200 status code and that's it ...)
3.d) The proxy server delivers the (cached and compressed!)
content to the M$IE 6.0 - even though the UserAgent denied
to be able to accept it.
3.e) Unlike other experiences, M$IE does _not_ show itself as
being the "tolerant guessing game browser".
It simply decides to not understand the content, and show
the normal "download dialog" to the user, to let him/her
decide whether to view the content or store it to disk.
In both cases the content is _not_ unzipped, i. e. strange
chars are shown in the browser window and a small binary
file is stored to disk.
M$IE should have known better, as the proxy server surely
forwarded the content the HTTP header including the
"Content-Encoding" header (which I can prove, see below)
but decided that this "can't have happened". :-(
So we now have an up-to-date browser able to handle gzipped
content in principle but failing miserably to do so, just be-
cause it is to "picky" to look at the "Content-Encoding" hea-
der! But let's not blame M$IE for that, but read on.
The very next HTTP request I have sent to this proxy was a
request for the same URL using Netscape 3.0, a browser which
surely cannot understand gzipped content and never sends any
"Accept-Encoding: gzip" header.
Netscape 3.0 again received the compressed content from the
proxy. But Netscape 3.0 at least displayed a pop-up box
"Warning: unrecognized encoding 'gzip'" before displaying the
(gzipped) content (it does _not_ offer it for download).
So the HTTP headers must at least have been delivered by the
proxy server. M$IE must have had a chance to know better.
Okay, but Netscape 3 _is_ ancient.
What about the very most up-to-date browser of the universe?
Mozilla 0.9.7 has a section inside its configuration
(preferences / debug / networking) which allows the user to
edit the "Accept-Encoding" string being sent to the server.
I eliminated the "gzip" part of it, and again requested the
page from the caching proxy.
And Mozilla, just like M$IE, did _not_ look at the "Content-
Encoding" header, but displayed the compressed content inside
the browser window!
If I add the "gzip" header to this list then Mozilla displays
the content correctly; if I remove it again, I see the binary
chaos once more. Mozilla is as intolerant as M$IE here. :-(
I cannot make Opera 6.01 _not_ understand gzipped content, as
this browser doesn't allow me to edit its "Accept-Encoding"
header. So here my browser experiments come to an end.
If I understand HTTP caching well enough then Apache _should_
have sent a "Vary:" header to the proxy so that the proxy would
at least have had a chance to detect that I must not deliver the
cached content to a UserAgent not sending the appropriate HTTP
header fields (which in my case even are two: "Accept-Encoding"
and "UserAgent", because of my "mod_gzip exclude" rule set).
I have even tried to send a "Vary" header myself, by using
mod_headers, but did not succeed in changing the mind of the
proxy as to stop trusting its cache content (which may well be
caused by being a HTTP/1.0 proxy only - but still: see below.).
So far my report about what has happened - _now_ I would like to
ask some questions.
1. Why doesn't mod_gzip send such a "Vary" header itself?
I believe according to HTTP/1.1 it only SHOULD do so, not MUST,
but who else should know about the problem, if not mod_gzip?
Does mod_gzip tell Apache (via some miraculous internal API)
that negotiation has come into the game, so that Apache itself
might be able to care about the "Vary" headers?
I have some page in the WWW which is using Content Negotiation
via Apache MultiViews, and in this case Apache itself (or maybe
mod_negotiation?) _does_ care about the "Vary:" headers:
Vary: negotiate,accept-language
(Um, this gives me some new idea about question no. 3 below.)
2. While Apache may send "Vary" headers only unconditionally,
mod_gzip would be able to find out whether a URI content may
_ever_ change depending upon some header fields.
If there is a "exclude" rule firing for a file or uri pattern,
then the content of _this_ URL will _never_ be delivered in
compressed form; in these cases no "Vary:" header would be
required. In other cases a "Vary:" header would be necessary
to make the proxy mistrust its cache content.
And it would be helpful if the mod_gzipping Apache would send
as few "Vary:" headers as possible as to make proxy servers
cache the content in as many cases as possible! If each request
would contain a "Vary: UserAgent" because I have some mod_gzip
exclude rules depending upon the UserAgent the proxy won't ever
deliver cached content, unless two consecutive requests would
by pure chance contain the same UserAgent string (which would
be very unlikely in a normal Web environment).
Noone but mod_gzip can know about that; the current version of
mod_headers doesn't even allow for conditional creation of HTTP
headers (I know my "exclude" rules and would be able to use
mod_setEnvIf to conditionally set environment variables, but
mod_headers doesn't allow me to conditionally create headers).
But mod_gzip knows that an "exclude" rule always overrules an
"include" rule; so if I have only "exclude file" and "include
requeheader", i. e. work with a "positive list" of browsers,
mod_gzip would be able to know when to send a "Vary" header.
3. If anything of the above makes sense at all, then: which fields
should a "Vary:" header to be sent by mod_gzip contain?
I tried sending "Vary: Accept-Encoding" as well as
"Vary: Content-Encoding" - none worked. The HTTP/1.1 specifica-
tion in chapter 14.44 tells me
"The Vary field value indicates the set of request-header fields
that fully determines, while the response is fresh, whether a cache
is permitted to use the response to reply to a subsequent request
without revalidation". This would refer to the HTTP header of
the request (i. e. "Accept-Encoding").
Chapter 13.6 of HTTP/1.1 tells me "When the cache receives a
subsequent request whose Request-URI specifies one or more cache
entries including a Vary header field, the cache MUST NOT use such
a cache entry to construct a response to the new request unless all
of the selecting request-headers present in the new request match
the corresponding stored request-headers in the original request."
So if I get that right, then the proxy must store not only the
HTTP response header inside its cache (to deliver it to the
UserAgent) but the original HTTP request header as well, to
decide whether this one and the header of some subsequent HTTP
request do "match" (a term defined in chapter 13.6 as well).
4. Nearly the same question, but this time it's about syntax, not
semantics: Let's say the answer to Nr. 3 was "Accept-Encoding
and UserAgent". How should this "Vary:" header look like?
HTTP/1.1 chapter 14.44 tells me that "The Vary field value
indicates the set of request-header fields ..." - but not in which
syntax I have to describe a "Set".
I guessed this to be done in the "normal" way of delivering
list values, i. e. a comma-separated list of values, and tried
"Vary: Accept-Encoding, UserAgent".
But doing so made the proxy server telling me that I sent a
reply "not conforming to RFC 2616, chapter 14" (remember, we
are still talking about a HTTP/1.0 proxy!).
I changed this to only send "Vary: Accept-Encoding", and the
proxy now accepted the response - but didn't correctly decide
to not forward the cached content to the UserAgent.
So how would the correct syntax for a list of "Vary:" fields
be in my case? And is the proxy right about RFC 2616?
5. One more strange experience: I tried to let my Apache send
_two_ "Vary:" headers, by using mod_headers and the directives
Header add "Vary:" "Accept-Encoding"
Header add "Vary:" "User-Agent"
, as it looks like my proxy doesn't allow comma separated values,
by might possibly accept two "Vary" headers.
The Apache 1.3 documentation
(http://httpd.apache.org/docs/mod/mod_headers.html#header)
tells me that in this case "The response header is added to the
existing set of headers, even if this header already exists. This can
result in two (or more) headers having the same name." - exactly
what I intended to do.
_But_ Apache didn't send two headers! Instead, it was sending
"Vary: Accept-Encoding,User-Agent", i. e one header only, what I
would have expected to be the effect of using
Header append "Vary:" "Accept-Encoding"
Header append "Vary:" "User-Agent"
which I also tried and again caused one "Vary:" header to be
sent to the client.
Is this a bug in Apache I have discovered?
I even checked that by sending a request via Telnet, as I tend
now to mistrust what Perl's LWP::UserAgent tells me about HTTP
headers received ...
I have later changed the above to
Header add "Vary:" "negotiate,Accept-Encoding"
Header add "Vary:" "negotiate,User-Agent"
but Apache still joins these to one line:
Vary: negotiate,Accept-Encoding,User-Agent
6. Finally, if all of the above would cause just too much trouble
for all the proxies out there, mod_gzip would still have had
the chance to tell them that some negotiation _might_ have been
used that may be too complicated for the proxies to understand,
by sending a header like
"Vary: *"
which is said to "always fail to match and subsequent requests on
that resource can only be properly interpreted by the origin server."
by RFC 2616, section 13.6.
By doing so mod_gzip might just tell all the proxies out there
to stay away from any compressed content.
And again, noone but mod_gzip would be able to send as few of
these headers as possible - we don't need them if mod_gzip would
find out that it must _never_ compress this document and thus no
negotiation will ever take place for it.
Greetings, Michael