[Mod_gzip] MSIE cannot handle Vary header(s)
mod_gzip@lists.over.net
mod_gzip@lists.over.net
Tue, 10 Dec 2002 04:20:02 EST
Hello all.
This is a continuation of the thread entitled...
[Mod_gzip] "mod_gzip_send_vary=Yes" disables caching on IE
After several hours spent doing my own testing with MSIE and
digging into MSIE internals with a kernel debugger I think I
have the answers.
The news is NOT GOOD.
I will start with a SUMMARY first for those who don't have the
time to read the whole, ugly story but for those who want to
know where the following 'conclusions' are coming from I
refer you to the rest of the message and the "detail".
SUMMARY
There is only 1 request header value that you can use with
"Vary:" that will cause MSIE to cache a non-compressed
response and that is ( drum roll please ) "User-Agent".
If you use ANY other (legal) request header field name in
a "Vary:" header then MSIE ( Versions 4, 5 and 6 ) will
REFUSE to cache that response in the MSIE local cache.
This is why Jordan is seeing a caching problem and Slava
is not. Slava is 'accidentally' using the only possible "Vary:"
field name that will cause MSIE to behave as it should
and cache a non-compressed response.
Jordan is seeing non-compressed responses never being
cached by MSIE because the responses are arriving
with something other than "Vary: User-Agent" like
"Vary: Accept-Encoding".
It should be perfectly legal and fine to send "Vary: Accept-Encoding"
on a non-compressed response that can 'Vary' on that field
value and that response SHOULD be 'cached' by MSIE...
but so much for assumptions. MSIE will NOT cache this response.
MSIE will treat ANY field name other than "User-Agent"
as if "Vary: *" ( Vary + STAR ) was used and it will
NOT cache the non-compressed response.
The reason the COMPRESSED responses are, in fact,
always getting cached no matter what "Vary:" field name
is present is just as I suspected... it is because MSIE
decides it MUST cache responses that arrive with
"Content-Encoding: gzip" because it MUST have a
disk ( cache ) file to work with in order to do the
decompression.
The problem exists in ALL versions of MSIE but it's
even WORSE for any version earlier than 5.0. MSIE 4.x
will not even cache responses with "Vary: User-Agent".
That's it for the SUMMARY.
The rest of this message contains the gory details.
There are 'sections' to this since it gets a little deep.
* WHY WILL MSIE ONLY CACHE "Vary: User-Agent"
Because this was specifically reported as a bug against
MSIE 4.0 way back in 1999 and they hacked a fix into
the browser base code for it. That 'hack' has been
carried forward to every new version but they have never done
anything else with "Vary:" and to this day "User-Agent" is the
only string value for "Vary:" that they are even 'checking' for.
I discovered this only AFTER using a kernel debugger and
watching the code evaluate the "Vary:" field. Once I saw that
it was the only possible value that would cause anything
to be 'cached' I did a GOOGLE search and found that it
has been a known issue since 1999.
For anyone who is interested you might want to check out
the following links and read the 'history' behind this bug...
This first link is a Problem Report submitted to the Apache
Server folks on March 25, 1999. The TITLE of that bug report is...
"Client bug: IE 4.0 breaks with "Vary" header"
http://bugs.apache.org/index.cgi/full/4118
The problem was much more serious in MSIE 4.0 than
it is now. MSIE 4.0 was actually getting TOTALLY
confused whenever a "Vary:" header would arrive
in a response and it was treating it as a download error
and putting up a nasty Dialog box.
It was a very VISIBLE bug and that's why Microsoft
stepped in and "fixed" it.
The following is taken from the PR report itself.
[snip]
When Internet Explorer receives a "Vary: Host" header, or a "Vary: *"header,
the system will improperly report "file not found". The exact error
message is:"Internet Explorer cannot download from the Internet site
viewer.zip
from palm.dahm.com. The downloaded file is not available. This could be
due to your Security or Language settings or because the server was
unable to retrieve the requested file."
[snip]
What (apparently) happened when Microsoft realized they had
this bad bug in their "Vary:" handler is that they simply 'hacked'
it so that at least it would not put up such a bad ( and incorrect )
error message. It was around that time that they at least added
some code to pick up a "Vary:" field but all they really did was
set a flag to cause ALL RESPONSES that have "Vary:" headers
to be treated as "Vary: *" and nothing would ever be cached.
This was really no different from what all existing versions of
SQUID ( at that time ) would do. It was only 7 weeks ago that
a version of SQUID emerged which would do anything other
than this base level 'hack' at "Vary:" handling.
ASIDE: What is interesting about the above Apache PR report
from 1999 is that they added a 'hack' of their own to get
around the problem which few people know about but which
could NOW cause all kinds of NEW problems if anyone
is still actually using the 'hack'...
The following is taken directly from Dean Gaudet's commit
log when he 'patched' the server to get around the "Vary:"
problem(s) in MSIE...
[snip]
A new environment variable, "force-no-vary", has beenadded.
If set with BrowserMatch, the Vary field will notbe sent as part of
the response header. This change shouldappear in the next release
after 1.3.6. Thanks for the report and for using Apache!
[snip]
What that means is that people can UNCONDITIONALLY
set their Servers to NEVER send "Vary:" headers based
totally on the "BrowserMatch" directive in Apache.
This means that NO ONE would be able to add a "Vary:"
header ( like mod_gzip or mod_deflate or DynaZip or
whoever ) even if they WANTED to. It will be 'stripped out'.
Something to keep in mind as all this "Vary:" stuff starts
looming closer over the horizon.
But I digress...
Sometime after those initial MSIE bug reports and the first addition
of any code to even handle "Vary:" at all... the single pickup
for "Vary: User-Agent" was added in response to a specific
bug report against MSIE.
They added a flag to allow responses with "Vary: User-Agent"
to be cached locally and this fixed the 'bug' report but ALL
other values for "Vary:" field were (are) still ignored and
are treated just like "Vary: *" and nothing is cached locally.
The following is a 'message thread' at the W3C.ORG
forum itself from just this past spring/summer which shows
that the problems still exist even in MSIE 6.0.
If you go to this message link you can just click on 'Previous
Message' and 'Next message' to move forward and back
through the thread....
http://lists.w3.org/Archives/Public/ietf-http-wg/2002AprJun/0046.html
* TEST CASES
I played with an HTTP Sever and MSIE and narrowed down
the test response to the absolute minimum that would
produce the CORRECT behavior.
NOTE: At no time during these tests was I sending
any actual compression. My goal was to narrow down
what happens with NON-COMPRESSED responses
which is really the whole issue.
The CORRECT behavior is for the response to be CACHED
locally by MSIE and only retrieved if/when it EXPIRES or
if the user presses CTRL-R ( Reload ) or hits the "Refresh"
button.
Once MSIE has cached a page locally then hitting either the
FORWARD or BACK buttons or choosing the page from the
History list should NOT cause a new request for the page
to be emitted from the browser. When it is functioning
correctly MSIE should do nothing but reload the page
from the local cache.
The following was my 'base response' ( Similar to Jordan's
test case since it only sends back "Hello World"... )
[snip]
HTTP/1.1 200 OK
Content-Type: text/plain
Connection: Close
Hello World
[snip]
All versions of MSIE will simply 'do the right thing' when this
document arrives and will store it in the local cache and will
NOT 'reload' it unless you force it to by pressing 'Refresh'
or by clearing your local cache.
It makes no difference if there is an "Expires:" field.
If there isn't one... the default value assigned to the
document in the local cache is "Expires: None" which
means it will be there until you clear your cache ( manually ).
The very next test I tried was simply adding a "Vary:"
field and I choose the one that is MOST relevant to
this discussion... "Vary: Accept-Encoding"
Here is the next response that arrived in MSIE...
[snip]
HTTP/1.1 200 OK
Content-Type: text/plain
Connection: Close
Vary: Accept-Encoding
Hello World
[snip]
No version of MSIE will cache this response
If any response ever arrives with "Vary: Accept-Encoding"
then MSIE will constantly go back upstream to get
a new copy of the document... even when you are
simply using the BACK/FORWARD buttons or choosing
the original URI from the browser history list.
This is the behavior that Jordan and Tomaz and others
have discovered.
The next thing I tried was simply some 'other'
request field name along with Vary:
The "User-Agent" field name would probably
be the second most-relevant to compression
variants so here is what I sent to MSIE next...
[snip]
HTTP/1.1 200 OK
Content-Type: text/plain
Connection: Close
Vary: User-Agent
Hello World
[snip]
This worked fine. The response WAS CACHED by MSIE ( 5.x, 6.x )
just fine... just like the original base document with no "Vary:" field.
NOTE: Only versions of MSIE higher than 4.x will cache this.
I the tried all kinds of 'combinations' of strings in the
"Vary:" field. In the interests of time here I will just
list what 'variations' I tried for "Vary:" and the results.
Each test was identical to the 'base test' in all
other respects.
This gets pretty interesting...
Vary: Accept-Encoding <- Response is NOT cached by MSIE
Vary: User-Agent <- Response is CACHED! Always reloads from local
disk
Vary: Accept_Language <- Response is NOT cached by MSIE
Vary: * <- Response is NOT cached by MSIE ( Correct
behavior )
Vary: Host <- Respoinse is NOT cached by MSIE
NOTE: I think it's pretty amazing that MSIE won't even
cache anything just because it has "Vary: Accept-Language".
Of all the request headers... "Accept-Language" would
probably be the most often used "Vary:" field and
"Vary: Accept-Language" represents one of the reasons
why "Vary:" was invented in the first place... so people
could be sure they are getting the right LANGUAGE
on the pages they ask for.
* SPELLING COUNTS ( SORT OF )
A little more testing proved that the actual pickup for
"User-Agent" in MSIE ( The only one they are checking
for ) is, in fact, using 'strncmp()' and is NOT case-sensitive.
This is what you would expect.
Vary: User-Agent <- Response is CACHED
Vary: user-Agent <- Response is CACHED
Vary: User-agent <- Response is CACHED
Vary: user-agent <- Response is CACHED
However... here is where SPELLING COUNTS...
Vary: User-Agent: <- Response is NOT CACHED ( Extra colon on the end )
Vary: User Agent <- Response is NOT CACHED ( SPACE instead of HYPHEN )
Vary: UserAgent <- Response is NOT CACHED ( Hypen left out )
Punctation also produces some strange results...
( Keep in mind that you are supposed to be allowed to list
any number of field names separated by commas... )
Vary: User-Agent, <- Response is NOT CACHED ( Single comma on the end )
Vary: ,User-Agent <- Response is NOT CACHED ( Single comma on front )
Vary: "User-Agent" <- Response is NOT CACHED ( Quote marks not allowed )
The following was a surprise....
Even though the "User-Agent" field name is present MSIE will
still refuse to cache the response if there is ANOTHER field name present...
Vary: User-Agent, Accept-Encoding <- Response is NOT CACHED.
And despite what Slava says about using "Vary: User-Agent,*"
( User-Agent + comma + STAR ) I could NOT get any of the
following responses to cache at all..
Vary: User-Agent,* <- Response is NOT CACHED ( No space after comma )
Vary: User-Agent, * <- Response is NOT CACHED ( Space after comma )
Vary: *,User-Agent <- Response is NOT CACHED ( No space after comma )
Vary: *, User-Agent <- Response is NOT CACHED ( Space after comma )
Slava? Are you still reading along?
Are you SURE you are seeing non-compressed documents
cached locally by MSIE using "Vary: User-Agent,*" ?
I could NOT get this to work in ANY version of MSIE
but just plain old "Vary: User-Agent" DOES WORK.
* CACHE-CONTROL FORCES MSIE TO WORK CORRECTLY
"Vary: Accept-Encoding" will always cause MSIE to refuse
to cache the response.
However... if you simply send the following instead...
Vary: Accept-Encoding
Cache-Control: private
Then MSIE will now function 'normally' and cache the
( non-compressed ) response.
This is not a 'magic bullet' however nor is it a 'fix'.
It's really just an interesting discovery.
Unfortunately you gain nothing but a new local cache
file because MSIE will still go back upstream for a
new version of the page every time you hit BACK
or FORWARD just as if the page was never written
to the local cache at all ( which it WAS... and it
even has "Expires: None" on it which means MSIE
SHOULD be reloading from the local cache )
This looks like yet another bug with regards to
'Cache-Control:' or something. Not sure.
That's about it.
Like I said... the news is NOT GOOD here.
It means that despite the fact that no Proxy Servers
( other than the brand new SQUID 2.5 only out
for 7 weeks now ) really support "Vary:" the way it was
designed... MSIE itself has a LOOONG way to go before
anyone will be able to use "Vary:" for anything.
...and since you can't stop an inline Proxy from
forwarding the "Vary:" headers to the end-point
browser ( You probably SHOULD be able to
but that's a whole 'nother thread of discussion )
then it is looking more and more as if using
"Vary:" for anything is just simply going to CAUSE
far more problems then it will SOLVE.
It will be YEARS before all this gets fixed... if ever.
Gotta run.
If you have read this far down then your brain
is probably as fried as mine is at the moment.
Later...
Kevin
PS: If anyone is still seeing totally different results than
what I have now seen with my own eyes ( and debuggers )
then let's keep this going and figure out what the heck
is going on here.