[Mod_gzip] Which browsers lie about accepting compressed html files?
Mitchell Steven Spector
mod_gzip@lists.over.net
Tue, 30 Jan 2001 03:47:48 -0800
On Tue, Jan 30, 2001 at 02:49:50AM -0600, Kevin Kiley wrote:
>
> Hi Mitchell
> This is Kevin Kiley
> CTO for Remote Communcations, Inc.
>
> Comments are inline below.
> It's 2:00 AM here ( again ) so this will be a quick
> reply. I will try to provide more detail in the AM.
Hi, Kevin,
Thanks for your interesting reply.
The errors we're getting are on straightforward
text/html pages (no style sheets or Javascript).
Some of them are generated by cgi programs or by
server-side includes, but that shouldn't matter,
since the rctpd process just gets the html file
from the back-end server and doesn't care how
that back-end server decided what to send out.
The compression is fantastic. But I wonder if
there's any way to carry this off on a production
server that provides content to the general public.
It's OK if we send some of our clients uncompressed
content, but we need to be sure we're not sending
compressed content to browsers that don't interpret
it correctly (at least the "major" browsers --
recent versions of MSIE and Netscape, on both Windows
and the Mac, have to work on our content). We have
to live with whatever clients people happen to be
using, and we can't add any additional client-side
software.
I've interspersed some other comments below.
> > We tried out compression, and everything was working
> >beautifully (thanks, Kevin!). The only problem is that
> >some browsers are apparently displaying the compressed
> >content directly, without uncompressing it. (IE 5 for
> >the Mac is one example, and there seem to be others as
> >well, on both Windows and Mac platforms. I'm not sure
> >yet if any versions of Netscape exhibit the problem.)
> >It would seem that these browsers are claiming to accept
> >compressed content but then aren't uncompressing it.
> >
> > Is this a known problem?
>
> You betcha. There is currently no known browser which
> says it is fully HTTP 1.1 compliant which actually is.
>
> There actually is no known Server ( including Apache )
> which lives up to its full HTTP 1.1 rating, either.
>
> [Interesting IETF details snipped.]
>
> As far as the browsers appearing to not uncompress the
> data even after asking for it... there are currently any number
> of reasons why this might be so.
>
> Example: Even if every other text/* document shows
> decompresses fine and shows up fine... if it happens
> to be a 'style sheet' coming across then some versions
> of MSIE even 'screw up' because the folks who wrote
> the style sheet handling for the browser weren't required to
> do the full Content-decoding scheme as others were.
> Go figure. It's a mess.
That's not what we're running into. We're not doing
anything with style sheets -- we want to remain compatible
with as many browsers as possible.
> > Does anyone have a work-around,
>
> Sure. We do.
>
> We have 'client side software' that can turn ANY Internet
> program ( Browser, Email program, Custom APP, etc )
> into a fully capable IETF Content-decoding user agent.
>
> The software is NOT a 'plug-in'. You install it once and
> it turns ANY Internet software on that machine into
> a decoding-capable user-agent.
This sounds nice, but:
Our visitors are primarily home users and schools (K-12).
These people are in the public at large, so we have no way
to place software on the client machines (and, as you suggest,
they wouldn't be very receptive to that idea anyway).
Any work-around for us has to be on the server side.
> [Comments on users' unwillingness to change snipped.]
> >or a list of problem user agents (for which we could choose
> >not to compress the content)?
>
> This is coming!... and it is the only real solution.
>
> It has to be something akin to the BrowserMatch code in
> Apache which was thrown in to force a browser version
> to be automatically 'bumped down' so things don't
> screw up at certain sites. Concept is the same... there
> will always be a need to recognized certain user-agents
> and identify their capabilites. The situation of being able
> to rely on the protocol information alone just gets worse
> every day... not better.
>
> We have the code and the tables here already and it's going
> into mod_gzip. At one point in development of the 'lookups'
> we called the code 'WCYD' module ( What Can You Do )
> but in the past few weeks after seeing some creative
> TV commercials I am now more fond of calling it the
> WIYW interface ( What's in YERRRRR Wallet! ).
>
> ( I am joking, of course. It's very late here. )
Well, it's 3 a.m. here :-).
If we had a list of browsers and their capabilities,
we could probably use BrowserMatch or the like in
our front-end Apache to decide which requests to
send on to rctpd and which to simply process normally
(without compression).
> >Or could there be something else we're doing wrong
> >to cause this?
>
> See above. Try 'excluding' style sheet mime types and
> see if that was what looks 'messed up'.
>
> > When I tried IE 5 for the Mac, I saw
> >that http://www.remotecommunications.com came in fine;
> >is that page being compressed, and, if so, how are you
> >doing it?
>
> Our home page is not, itself, compressed and we do not
> use Content-negotiation for a static compressed version
> since ( frankly ) that doesn't even work the way it is
> supposed to, either.
>
> What might be a clue, however, is that we use SIMPLE
> HTML with no embedded objects, style sheets, or Java.
> We don't even have a Javac compiler on any of our
> Servers.
>
> So perhaps that might be 'part' of the issue... the more
> complicated and obfuscated the page and the more it
> is 'mixing mime types' the more likely the browser will
> screw up the IETF Content-decoding.
No, we're not doing anything like that. Everything is
simple on the client side: html files with gif and jpeg
images; some of the gif's are animated. There are no
style sheets, no Java, no Javascript, etc. We're even
using server-side imagemaps rather than client-side
ones. Some of these things are generated in a complicated
fashion on the server, but that shouldn't matter, since
rctpd only sees them after they're generated. (And, no,
we don't expect to compress images at all.)
> > We used a front-end Apache server set up as
> >a proxy (for logging purposes), pointing to an rctpd
> >process, which was fed from a back-end Apache server
> >doing all our usual stuff. The two Apache servers are
> >set up as virtual servers under a single Apache 1.3.12.
>
> Never tried that one.
It works :-).
> Question: If RCTPD simply had better logging would
> that allow you to eliminate the additional 'hop' at the
> front door? We can add this if you need it.
We might be able to eliminate the front-end server if
rctpd gave us logs like the Apache combined log; we'd
also need rctpd to pass through the host name (HTTP_HOST)
via some mechanism (a "From-Host:" header or something
like that). I'd have to think about it to be sure
everything else would still work. (Redirections can
be tricky, but having Apache up front means we can
use ProxyPassReverse to handle them correctly.)
But I think the question is moot (or premature). If
we can't solve the question of how to identify and
handle non-compliant browsers, we can't use any
compression at all. (The front-end proxy server
method appears to be working fine, and it offers
some flexibility. We might simply leave that alone.)
> >The test browsers that we had tried behaved perfectly,
> >and we thought we were all set, until we started hearing
> >from visitors.
>
> What exactly were you 'hearing'... what you mentioned?...
> that garbage was showing on some browser screens?
We've been able to duplicate all the errors ourselves.
Errors we're seeing are of three sorts:
1. Some pages just show up as gibberish. Presumably
this is the gzipped content, which the browser is failing
to uncompress. (MSIE 5.0, Windows and Mac. This seems
to be happening on files with a .html file extension.)
2. Sometimes MSIE says that it's receiving an unknown
MIME type and wants the user to download and save it (I
don't have it in front of me, but I think the MIME type
it's seeing is application/gzip-compressed or
application/x-gzip-compressed). These should be
text/html -- and that's how they show up when I turn
compression off. (MSIE 5.0, Windows and Mac. This
seems to be happening on files with a .shtml file
extension and on cgi output -- these are all marked
text/html by Apache.)
3. Sometimes part of the page comes in correctly, followed
by a small section of visible HTML tags, and then a dialog
box pops up saying that an unknown error occurred. (I've
only seen this on MSIE 5.0 under Windows, on a machine
running the N2H2 Bess child filtering program, behind a
proxy, at a local library. I suspect that N2H2 is
inserting its banner near the bottom of the page, without
realizing that they've accepted gzipped content that can't
directly insert text into.)
I've only seen these errors on MSIE 5.0 (Windows and
Mac). Other browsers I've tried have worked without error,
but I haven't attempted to do an exhaustive survey of all
common browsers to see if others might fail too.
By the way, could any of the problems be related
to the old x-gzip header? Are some browsers expecting
x-gzip rather than gzip as the Content-encoding value?
> > The whole set-up was quite nice; it would
> >be a shame if it didn't meet the needs of a production
> >web site due to buggy browser implementations.
> >
> > Any thoughts on this would be appreciated.
>
> See above. If you can't get someone to move up to
> a browser that even has a chance of being HTTP 1.1
> compliant then you will simply need to 'function enhance'
> that user agent with special software like the client-side
> modules that we have.
These just aren't options for us. But it would
be OK if we could just identify "bad" browsers and
send them uncompressed content.
> BTW: This problem is much broader than finally allowing
> people to receive 80 percent less HTML data in real
> time... which is something everyone can and should do.
>
> No... the problem goes deeper. if the Internet Server/Client
> model is ever going to move into new and better areas
> then no one can expect the 2 to ever 'stay in sync' and
> the addition of client-side software to function-enhance
> the core browsers is going to have to get much-better
> much-soon... or everything gets stuck in the mud.
>
> Netscape is already DEAD. It was 'Dead Man Walking'
> when AOL gobbled it up but that was more than a year ago
> and now it's really time for a Lily and the Headstone.
> Netscape has been 'buried' by AOL in favor of their
> ongoing use of MSIE.
>
> So alternatives are not broadening... they are narrowing,
> It's the 'Now all restuarants are Taco Bell' syndrome.
> Once a company starts issuing stock it is, by its very
> nature, forced to 'gobble up' and get as big as it can
> until it implodes. It's just the way of things. Stockholders
> just like to see 'forward momentum' no matter whether
> its 'good' momentum or not.
>
> If people thought it was hard to get new features added
> to the Internet when there were at least 2 major browser
> providers then just wait and see how hard it will be
> when there is only one. ( AOL is simply a brown wrapper
> on MSIE, in case you thought for a moment that there
> was such a thing as an 'AOL browser' ).
Getting anything done in a reliable, consistent fashion
on the client side is virtually impossible. There are
just too many different clients and platforms out there.
I think the server side is much more approachable. You
only have to reach a smaller number of people, they're
technically more knowledgeable, etc.
> I guess this wan't such a 'quick' reply after all but I will still
> try to provide a better answer tomorrow.
>
> Yours...
> Kevin Kiley
Thanks again for taking the time to reply.
Mitchell