[Mod_gzip] Wildcard in accept-encoding -> Bug in mod_gzip's "Accept-Encoding" parser?

mod_gzip@lists.over.net mod_gzip@lists.over.net
Mon, 21 Oct 2002 16:17:23 +0200


Hi Todd,


first of all, thanks for the good explanations.

>: I am working on a project which supports gzip encoded content. For this
>: purpose I add the following accept-encoding header to a request but the
>: returned content is never gzip'ed.
>:
>: Accept-encoding: *;q=0.001
>
> This is because, like "Accept: *", the server is conservative in what
> it sends and prefers what is explicitly specified.

And I think this conservative approach is a good idea, as we know
even browsers that are sending "Accept-Encoding: gzip" by are lying
about being able to correctly handle this stuff (Netscape 4.x).

> For instance, while a web browser that presents
>    Accept: text/html, image/gif, image/jpg, *
> like most browsers do, the above line does NOT explicitly say image/png,
> so a web server would prefer to send some other image type to this
browser.

Yep. If the browser wants to tell what it prefers to get, it is able
to express this by adding quality values to the individual encodings.

The browser in question might send

     Accept-Encoding: gzip;q=1.0, identity;q=0.5, *;q=0

This would mean:
     "I prefer to receive gzipped content; as an alternative, give
      me the uncompressed version but definitely no strange encoding."

(Example taken from
     http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3)

> Typically in HTTP, a * indicates that other formats can be handled in an
> out of band or non-inline fashion--i.e. not rendered "natively" like a
> nonencoded stream.  If, for instance, your "Accept-Encoding: *"  really
> were to indicate "send me any encoding as if it were native", I could
> legally send x-lharc or x-btoa or a number of other things that your
> application is very likely not capable of handling.

The procedure suggested by HTTP in chapter 14.3 is as follows:

# A server tests whether a content-coding is acceptable, according to an
# Accept-Encoding field, using these rules:
#
# 1. If the content-coding is one of the content-codings listed in
#    the Accept-Encoding field, then it is acceptable, unless it is
#    accompanied by a qvalue of 0. (As defined in section 3.9, a
#    qvalue of 0 means "not acceptable.")

This doesn't apply in our case, as the client didn't express what
it is able to understand.

# 2. The special "*" symbol in an Accept-Encoding field matches any
#    available content-coding not explicitly listed in the header
#    field.

This case applies, we are now free to send gzipped content if we
want to. But we aren't forced to do so either.

#
# 3. If multiple content-codings are acceptable, then the acceptable
#    content-coding with the highest non-zero qvalue is preferred.

This case doesn't apply (yet, but see below).

# 4. The "identity" content-coding is always acceptable, unless
#    specifically refused because the Accept-Encoding field includes
#    "identity;q=0", or because the field includes "*;q=0" and does
#    not explicitly include the "identity" content-coding.

So the "identity" encoding would also be acceptable.

#    If the Accept-Encoding field-value is empty, then only the
#    "identity" encoding is acceptable.

This doesn't apply, as the field value wasn't empty.

# If an Accept-Encoding field is present in a request, and if the server
# cannot send a response which is acceptable according to the Accept-
# Encoding header, then the server SHOULD send an error response with
# the 406 (Not Acceptable) status code.

Doesn't apply, we have at least two acceptable encodings.

# If no Accept-Encoding field is present in a request, the server MAY
# assume that the client will accept any content coding.
# In this case, if "identity" is one of the available content-codings,
# then the server SHOULD use the "identity" content-coding, unless it
# has additional information that a different content-coding is meaningful
# to the client.

This seems to be as close as can be to the case we are talking
about.
Literaly speaking, we _had_ an Accept-Encoding field, but it
didn't give us any additional information. We _are_ about to
think that we are allowed to send whatever we want.
But in this case the SHOULD indicates that sending the uncom-
pressed content is the best we can do here.

Therefore I would rather not want to make this configurable in
mod_gzip.
The module is already doing the best it can do, and the client
would have been able to use some established mechanism to ask
for gzipped content.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The good thing about this discussion is that I believe to have
found some real bug in mod_gzip.

The detection of the "Accept-Encoding: gzip" information is
currently performed in the following way:


     tablestring = ap_table_get(r->headers_in, "Accept-Encoding");

      #ifdef MOD_GZIP_DEBUG1
      mod_gzip_printf( "%s: r->headers_in->Accept-Encoding = [%s]",
                        cn,npp(tablestring));
      #endif

      if ( tablestring )
        {
         #ifdef MOD_GZIP_DEBUG1
         mod_gzip_printf( "%s: 'Accept-Encoding' field seen...", cn);
         mod_gzip_printf( "%s: Checking for 'gzip' value...", cn);
         #endif

         if ( mod_gzip_stringcontains( (char *)tablestring, "gzip" ) )
           {
            #ifdef MOD_GZIP_DEBUG1
            mod_gzip_printf( "%s: 'gzip' value seen...", cn);
            #endif

            accept_encoding_gzip_seen = 1;
           }

where the mod_gzip_stringcontains function is actually doing a substring

match for the "gzip" value inside the "Accept-Encoding" header.



But what if the browser is sending this one:

     Accept-Encoding: deflate;q=1, gzip;q=0, identity;q=0.5, *;q=0.1

This would mean: "I prefer deflated content; I would rather take
uncompressed content than something I don't know anything about;
but I definitely cannot handle gzipped content."

The HTTP definitions says:
"If a parameter has a quality value of 0, then content with this parameter
is
`not acceptable' for the client."
(cited from
     http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.9)

So the existance of the "gzip" substring MUST NOT be taken as an
invitation to send gzipped content without further investigation -
it might even be used to explicitly _prevent_ the server from
sending gzipped content.

This looks to me as if the

    if ( mod_gzip_stringcontains( (char *)tablestring, "gzip" ) )
      {
       #ifdef MOD_GZIP_DEBUG1
       mod_gzip_printf( "%s: 'gzip' value seen...", cn);
       #endif

       accept_encoding_gzip_seen = 1;
      }
construction had to be replaced by some more sophisticated piece
of code, or that the 'mod_gzip_stringcontains' function would have
to deal with quality values.



Regards, Michael



P.S.: I have posted this description as a bug report to the

      Sourceforge site for mod_gzip.