[Mod_gzip] Apache 2: mod_deflate v. mod_gzip: The reason for the different compression results

mod_gzip@lists.over.net mod_gzip@lists.over.net
Wed, 5 Mar 2003 20:08:13 +0200


Hi Stephen,


> I found a copy of mod_gzip 2.0.40 buried in a less than obvious
> place on the net

was this the place where my link page
     http://www.schroepl.net/projekte/mod_gzip/links.htm
already points to, or something that I missed?

> and installed it. Needless to say, its compression performance
> almost exactly matches mod_gzip for Apache 1.3.x,

Does this 2.0.40 version use zlib or does it contain the
gzip code of Kevin Kiley (that would introduce another po-
tential difference into the scenario)?
The 2.0.40 version I link to (there is even a 2.0.43 version
available at the moment) contains the lines

     char mod_gzip_version[] = "2.0.26.1a";
     #define MOD_GZIP_VERSION_INFO_STRING "mod_gzip/2.0.26.1a"

and is still on that level in respect of software development
(missing a lot of directives and having no own gzip code),
as this one is about two years old now and has only been
adapted to the ever changing Apache 2.0.x APIs.

Especially, it uses the zlib - so we can effectively compare
it with mod_deflate, unlike mod_gzip 1.3.
And in this case I would really be surprised if even mod_gzip
2.0.40 beats mod_deflate in compression rate, given identical
configuration parameters, as then both modules would actually
use the same gzipping code.

But - we'll see ... ;-)

> Sorry to see that mod_deflate is being offered by the Apache Group
> as an alternative to mod_gzip, because it does not match the
> compression performance of mod_gzip.

In my ears this sounds a little too much like bashing the
Apache Group (and the past shows that this only makes com-
munication more difficult and serves no other purpose).
So please ... let's be kind to each other, especially while
we still don't know what is actually going on.

Nevertheless, this failure to use compression level 9 as
efficiently as a command line "gzip -9" (which would be the
fair competition - let us just leave mod_gzip 1.3 out of
the discussion for a moment) should make us have a look at
this piece of code (which fortunately is much smaller than
mod_gzip and thus much easier to debug).

I have only an Apache 2.0.36 there, but the mod_deflate of
it might well be up to date, and looks really simple (less
than 500 lines of source code). So I dare to have a look
into it, despite my humble knowledge of the C language ...

The compression level is set by a directive and then used in
exactly one code position:

        zRC = deflateInit2(&ctx->stream, Z_BEST_SPEED, Z_DEFLATED,
                           c->windowSize, c->memlevel,
                           Z_DEFAULT_STRATEGY);

This function call leads to somewhere outside the module -
probably to the zlib API ... just googled around a bit and
found
     http://www.kallisys.com/newton/zlib/doc/zlib.html
- bingo, this one explains the zlib API and this function.

Ian explicitly chose to use this function and not the simpler
"deflateInit()", which would have taken just one parameter
(the level 1-9, probably like the gzip commandline version).

The "deflateInit2" function supports additional parameters
to somehow fine-tune the compression algorithm, which even
are partially (!) supported by Apache 2.0 configuration
directives. (Not all of them ... and this will be important.)

So after all, the reason why mod_deflate shows "inferior
compression rates relatively to mod_gzip on level 6 or 9"
might be due to those other parameter values that may just
not fit to your test case. I am still not sure whether we
actually compare identical compression scenarios.

Let's have a look at the way mod_gzip 2.0.43 is invoking
the zlib interface:

    rc =
    deflateInit2(
    &zlib_ctx->strm,
    compression_level,
    Z_DEFLATED,
    MOD_GZIP_ZLIB_WINDOWSIZE,
    MOD_GZIP_ZLIB_CFACTOR,
    Z_DEFAULT_STRATEGY
    );

Kevin Kiley used the same API call back then, only that
his parameter values were constants, not directive values:

     #define MOD_GZIP_DEFLATE_DEFAULT_COMPRESSION_LEVEL 6
     #define MOD_GZIP_ZLIB_WINDOWSIZE -15
     #define MOD_GZIP_ZLIB_CFACTOR    9

The second parameter "compression level" is the thing that
can be set by a configuration directive in mod_gzip 2.0.4x
and then overrides the default value of 6.

And now compare this to the way mod_deflate is using the
zlib API:

     zRC =
     deflateInit2(
     &ctx->stream,
     Z_BEST_SPEED,
     Z_DEFLATED,
      c->windowSize,
     c->memlevel,
     Z_DEFAULT_STRATEGY);

Note that the value we assumed to be "the compression level"
(1-9) is used as fifth parameter (c->memlevel) in mod_deflate
while it is used as second parameter (compression level) in
mod_gzip 2.0.4x!

And the second parameter - where mod_gzip is using its para-
meter value - takes the constant Z_BEST_SPEED in mod_deflate.

And guess what:

     const Z_BEST_SPEED = 1;

Does anyone now still wonder why the results of mod_deflate
cannot be compared to those of mod_gzip?
=============================================================
mod_deflate is optimized on speed, not on compression effect.
=============================================================

Both "level" directives have indeed different meanings - while
mod_deflate allows to fine-tune the fifth parameter and sets
the second one to "speed optimized", mod_gzip allows to fine-
tune the second one and uses a constant "9" for the fifth
parameter.

Let's finally have another look at the zlib API documentation:

deflateStream:DeflateInit2(level, method, windowBits, memLevel, strategy)
Initialize the deflate stream with advanced options.
level     - The level of compression to use. This is exactly like level
           parameter of DeflateInit method.

memLevel  This must be an integer or NIL. If this is NIL, 8 is assumed
           (just as with deflateInit function).
           The value (normally between 1 and 9) is passed directly to the
           deflateInit2 function. Note: Steve's tests showed that you can
           run out of memory with 8 on an MP2000. Lower memory level means
           worst compression, but unlike modifications of the windowWidth,
           it doesn't affect the compatibility.

So the "level" for mod_gzip seems to be exactly the one that
we are used to from the gzip commandline version, while the
"memlevel" of mod_deflate is just something different and
mod_deflate works effectively like "gzip -1" (which it then
must be compared to).
It is pure bad luck that both parameter values happen to have
a range from 1 to 9 and even their names sound similar ...



As Ian Holsman, the mod_deflate author, is watching this
discussion, we happen to have the right person on the line:
Ian, did you have a reason for your parameter assignment?

Optimizing for speed is not that bad an idea, especially for
a filter - but why not make this "level" parameter configura-
ble as well by another directive, as it actually _is_ important
for the compression ratio?
Why not let the user decide how much CPU load to accept?

Or maybe use something like 6 (or at least 3, which might be
my suggestion) as constant value there, and profit from Kevin
Kiley's and Peter Cranston's research about "the best value
for money"?

I would like to post anything of the above as a feature re-
quest for Apache 2.1 ... do I have to register on the deve-
loper's mailing list?

And by the way, it would be helpful to explain in the
mod_deflate 2.0 documentation that the "mem_level" parameter
is _not_ the "level" parameter from the gzip commandline
version and that mod_deflate intentionally is optimized for
speed (and uses the equivalent of "gzip -1"), not for com-
pression effect.


Regards, Michael