[Mod_gzip] "mod_gzip_send_vary=Yes" disables caching on IE? (1.3.26.1a)

mod_gzip@lists.over.net mod_gzip@lists.over.net
Sat, 7 Dec 2002 03:07:23 EST


--part1_127.1c501de3.2b23063b_boundary
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit


In a message dated 12/7/2002 1:35:18 AM Central Standard Time, 
jr-list-mod_gzip@quo.to writes:


> I wrote:
> > After further testing, I've found that IE 6 *will* cache a page 
> containing
> > "Vary: Accept-Encoding" just fine *if* there's also a "Content-Encoding:
> > gzip" header. It's only the files that mod_gzip sends back uncompressed
> with
> > no Content-Encoding header that are not being cached.
> 
> It's even more strange than I first thought...
> Without "Content-Encoding: gzip", it appears IE 6 will not cache pages
> containing *any* type of Vary header. (By "will not cache" I mean that 
> after
> the page is loaded once, it is reloaded every time you pass by it using
> Back/Forward.)
> 
> Below are some tests I performed with PHP. mod_gzip was not installed.
> 
> This will cache:
> 
>   echo "Hi";
> 
> This will NOT cache:
> 
>   header("Vary: Accept-Encoding");
>   echo "Hi";
> 
> This will NOT cache:
> 
>   header("Vary: blahblah");
>   echo "Hi";
> 
> This will cache:
> 
>   header("Content-Encoding: gzip");
>   echo gzencode("Hi2");
> 
> This will cache:
> 
>   header("Vary: Accept-Encoding");
>   header("Content-Encoding: gzip");
>   echo gzencode("Hi2");
> 
> This will cache:
> 
>   header("Vary: blahblah");
>   header("Content-Encoding: gzip");
>   echo gzencode("Hi2");
> 
> 
> Jordan Russell
> 
> _______________________________________________
> mod_gzip mailing list
> mod_gzip@lists.over.net
> http://lists.over.net/mailman/listinfo/mod_gzip

Hi Jordan...

I think what you are discovering here is that whether MSIE
writes the response to a cache file or not has very little
to do with the headers at all.

I believe what you are seeing is simply the way MSIE actually
does the 'decompression'.

MSIE always uses 'cache files' to do it. ( So does Netscape )

If something arrives and it happens to be compressed
( as indicated by "Content-Encoding: gzip" but sometimes
you don't even need this with MSIE and things still work
since MSIE keys off 'magic bytes' at the start of body
data ) then MSIE needs a place to WRITE the data no
matter what. It can't just start sending it to MSHTML.DLL
just yet because that DLL won't know what to do with it
until it's decompressed.

So MSIE pretty much HAS to open a 'cache' file in order
to store the compressed data as it arrives. It can 'bleed'
the data off to MSHTML.DLL and start decompressing
before the whole page arrives but still... it has to have a
place to store what hasn't been decompressed yet 
since this is happening on another thread of execution
different from the download.

If something arrives that is NOT compressed and it
has a "Vary:" header... then that's when I believe you
are seeing... MSIE's behavior of NOT caching that 
response at all. It just feeds the incoming data to
MSHTML.DLL which renders it and it never gets
written to disk anywhere because it doesn't have to.

Netscape does the same thing.
It will, however, use TWO different 'cache' files in order
to perfom the decompression. Watch your Netscape
cache closely if/when compressed pages are arriving.
You will see TWO different cache files hanging around
after a compressed page has arrived. One is the
raw compressed data and the other is the uncompressed
version. Netscape does the decompression from file <to> file
instead of file <to> memory like MSIE does.

The proof that Netscape keeps BOTH files around is in
some of the bug reports regarding "Content-Encoding: gzip'
and Netscape. Sometimes Netscape gets confused and
uses the WRONG cache file ( the raw, compressed one )
and you get garbage on your printer even though it has
already decompressed the page. That bug is simply 
some other code thread not getting the 'word' that there
are TWO cache files for that particular response and
it just uses the FIRST one... which is still compressed.

MSIE uses TWO cache files as well ( as far as I can
tell ) but it always DELETES the one used to hold 
the incoming raw compressed data the minute it's
been decompressed. That's why MSIE rarely has
the kind of problems that Netscape does. When the
dust settles there is only 1 local cache file and it
is the decompressed version.

The BAD part about the way MSIE does it, however, is
that not only does it delete the original "Content-Encoding: gzip"
version of the response... it now goes brain dead and has
absolutely no idea that the response ever showed up 
compressed. At least Netscape makes SOME attempt
to remember this fact following the download.

You can see for yourself the 'confusion' in MSIE after
it has decompressed. If you make calls to their cache
API's and try to discover things about the cache files you
will find that they have 'kept' the "Content-encoding: gzip"
header on the cache control header of responses that
arrived compressed but the actual body data cache file
is no longer "Content-encoding: gzip" at all. They wipe
out the original compressed cache file but then they
appear to forget to remove the "Content-encoding: gzip"
header from the cache file control header for that URI.

Not that any of this matters. MSIE is not a "Proxy Cache"
and it does NOT have to 'follow the rules' for HTTP Proxies.
It's a User-Agent and how it uses its own local files is
its own business.

I have no idea what versions of MSIE will/will not 
perform "Vary:" correctly. I was under the impression
that NONE of them really will and it's always been a
given that responses arriving with "Vary:" will simply
be displayed but never cached and treated as if they
arrived with "Expires: -1".

Without all the code to actually perform the "Vary:" scheme
according to HTTP specs that is all a Cache can do. It 
can't ever store something that "Varies:" if it has no code
to figure out when it does NOT "Vary:".

The "Vary:" scheme itself was never really meant for
end-point user caches since, well, that's exactly what
they are... the FINAL STOP. The "Vary:" scheme was
really meant for intermediate Proxies who might be
getting different requests for the same URI from
different User-Agents.

Yours
Kevin

--part1_127.1c501de3.2b23063b_boundary
Content-Type: text/html; charset="US-ASCII"
Content-Transfer-Encoding: 7bit

<HTML><FONT FACE=arial,helvetica><FONT  SIZE=2><BR>
In a message dated 12/7/2002 1:35:18 AM Central Standard Time, jr-list-mod_gzip@quo.to writes:<BR>
<BR>
<BR>
<BLOCKQUOTE TYPE=CITE style="BORDER-LEFT: #0000ff 2px solid; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px; PADDING-LEFT: 5px">I wrote:<BR>
&gt; After further testing, I've found that IE 6 *will* cache a page containing<BR>
&gt; "Vary: Accept-Encoding" just fine *if* there's also a "Content-Encoding:<BR>
&gt; gzip" header. It's only the files that mod_gzip sends back uncompressed<BR>
with<BR>
&gt; no Content-Encoding header that are not being cached.<BR>
<BR>
It's even more strange than I first thought...<BR>
Without "Content-Encoding: gzip", it appears IE 6 will not cache pages<BR>
containing *any* type of Vary header. (By "will not cache" I mean that after<BR>
the page is loaded once, it is reloaded every time you pass by it using<BR>
Back/Forward.)<BR>
<BR>
Below are some tests I performed with PHP. mod_gzip was not installed.<BR>
<BR>
This will cache:<BR>
<BR>
&nbsp; echo "Hi";<BR>
<BR>
This will NOT cache:<BR>
<BR>
&nbsp; header("Vary: Accept-Encoding");<BR>
&nbsp; echo "Hi";<BR>
<BR>
This will NOT cache:<BR>
<BR>
&nbsp; header("Vary: blahblah");<BR>
&nbsp; echo "Hi";<BR>
<BR>
This will cache:<BR>
<BR>
&nbsp; header("Content-Encoding: gzip");<BR>
&nbsp; echo gzencode("Hi2");<BR>
<BR>
This will cache:<BR>
<BR>
&nbsp; header("Vary: Accept-Encoding");<BR>
&nbsp; header("Content-Encoding: gzip");<BR>
&nbsp; echo gzencode("Hi2");<BR>
<BR>
This will cache:<BR>
<BR>
&nbsp; header("Vary: blahblah");<BR>
&nbsp; header("Content-Encoding: gzip");<BR>
&nbsp; echo gzencode("Hi2");<BR>
<BR>
<BR>
Jordan Russell<BR>
<BR>
_______________________________________________<BR>
mod_gzip mailing list<BR>
mod_gzip@lists.over.net<BR>
http://lists.over.net/mailman/listinfo/mod_gzip</BLOCKQUOTE><BR>
<BR>
Hi Jordan...<BR>
<BR>
I think what you are discovering here is that whether MSIE<BR>
writes the response to a cache file or not has very little<BR>
to do with the headers at all.<BR>
<BR>
I believe what you are seeing is simply the way MSIE actually<BR>
does the 'decompression'.<BR>
<BR>
MSIE always uses 'cache files' to do it. ( So does Netscape )<BR>
<BR>
If something arrives and it happens to be compressed<BR>
( as indicated by "Content-Encoding: gzip" but sometimes<BR>
you don't even need this with MSIE and things still work<BR>
since MSIE keys off 'magic bytes' at the start of body<BR>
data ) then MSIE needs a place to WRITE the data no<BR>
matter what. It can't just start sending it to MSHTML.DLL<BR>
just yet because that DLL won't know what to do with it<BR>
until it's decompressed.<BR>
<BR>
So MSIE pretty much HAS to open a 'cache' file in order<BR>
to store the compressed data as it arrives. It can 'bleed'<BR>
the data off to MSHTML.DLL and start decompressing<BR>
before the whole page arrives but still... it has to have a<BR>
place to store what hasn't been decompressed yet <BR>
since this is happening on another thread of execution<BR>
different from the download.<BR>
<BR>
If something arrives that is NOT compressed and it<BR>
has a "Vary:" header... then that's when I believe you<BR>
are seeing... MSIE's behavior of NOT caching that <BR>
response at all. It just feeds the incoming data to<BR>
MSHTML.DLL which renders it and it never gets<BR>
written to disk anywhere because it doesn't have to.<BR>
<BR>
Netscape does the same thing.<BR>
It will, however, use TWO different 'cache' files in order<BR>
to perfom the decompression. Watch your Netscape<BR>
cache closely if/when compressed pages are arriving.<BR>
You will see TWO different cache files hanging around<BR>
after a compressed page has arrived. One is the<BR>
raw compressed data and the other is the uncompressed<BR>
version. Netscape does the decompression from file &lt;to&gt; file<BR>
instead of file &lt;to&gt; memory like MSIE does.<BR>
<BR>
The proof that Netscape keeps BOTH files around is in<BR>
some of the bug reports regarding "Content-Encoding: gzip'<BR>
and Netscape. Sometimes Netscape gets confused and<BR>
uses the WRONG cache file ( the raw, compressed one )<BR>
and you get garbage on your printer even though it has<BR>
already decompressed the page. That bug is simply <BR>
some other code thread not getting the 'word' that there<BR>
are TWO cache files for that particular response and<BR>
it just uses the FIRST one... which is still compressed.<BR>
<BR>
MSIE uses TWO cache files as well ( as far as I can<BR>
tell ) but it always DELETES the one used to hold <BR>
the incoming raw compressed data the minute it's<BR>
been decompressed. That's why MSIE rarely has<BR>
the kind of problems that Netscape does. When the<BR>
dust settles there is only 1 local cache file and it<BR>
is the decompressed version.<BR>
<BR>
The BAD part about the way MSIE does it, however, is<BR>
that not only does it delete the original "Content-Encoding: gzip"<BR>
version of the response... it now goes brain dead and has<BR>
absolutely no idea that the response ever showed up <BR>
compressed. At least Netscape makes SOME attempt<BR>
to remember this fact following the download.<BR>
<BR>
You can see for yourself the 'confusion' in MSIE after<BR>
it has decompressed. If you make calls to their cache<BR>
API's and try to discover things about the cache files you<BR>
will find that they have 'kept' the "Content-encoding: gzip"<BR>
header on the cache control header of responses that<BR>
arrived compressed but the actual body data cache file<BR>
is no longer "Content-encoding: gzip" at all. They wipe<BR>
out the original compressed cache file but then they<BR>
appear to forget to remove the "Content-encoding: gzip"<BR>
header from the cache file control header for that URI.<BR>
<BR>
Not that any of this matters. MSIE is not a "Proxy Cache"<BR>
and it does NOT have to 'follow the rules' for HTTP Proxies.<BR>
It's a User-Agent and how it uses its own local files is<BR>
its own business.<BR>
<BR>
I have no idea what versions of MSIE will/will not <BR>
perform "Vary:" correctly. I was under the impression<BR>
that NONE of them really will and it's always been a<BR>
given that responses arriving with "Vary:" will simply<BR>
be displayed but never cached and treated as if they<BR>
arrived with "Expires: -1".<BR>
<BR>
Without all the code to actually perform the "Vary:" scheme<BR>
according to HTTP specs that is all a Cache can do. It <BR>
can't ever store something that "Varies:" if it has no code<BR>
to figure out when it does NOT "Vary:".<BR>
<BR>
The "Vary:" scheme itself was never really meant for<BR>
end-point user caches since, well, that's exactly what<BR>
they are... the FINAL STOP. The "Vary:" scheme was<BR>
really meant for intermediate Proxies who might be<BR>
getting different requests for the same URI from<BR>
different User-Agents.<BR>
<BR>
Yours<BR>
Kevin<BR>
</FONT></HTML>
--part1_127.1c501de3.2b23063b_boundary--