[Mod_gzip] Antwort: mod_gzip v. mod_deflate
mod_gzip@lists.over.net
mod_gzip@lists.over.net
Tue, 4 Mar 2003 20:00:07 +0200
Hi Stephen,
> I examining both mod_gzip for Apache/1.3.x and mod_deflate for
> Apache/2.0.x and had a question.
this looks like an interesting topic; so please allow me
to send a copy of my response to the mod_gzip mailing list
as well, although this one started as private mail.
> Is there a reason why the compression achieved by mod_gzip is 4%-7%
> greater than that achieved by mod_deflate?
Oh - I am really surprised to read this.
I would have expected mod_deflate to provide better values
(for an explanation see below).
I hope you have run mod_deflate on its maximum level.
And I hope that you write about actual compression rates that
you calculate on your own ... not about the percentage values
that mod_gzip produces (and rounds up even when it should round
them down - I am not sure this is fixed even in 1.3.26.1a).
> I am not complaining on the whole about a 50%-80% reduction in
> transferred file-size, but I am curious to know what the differences
> are in the methods used for encoding the content.
First of all: There is not "a single one gzip compression".
The gzip algorithm allows for a number of compression levels;
the higher levels you use, the more intelligence will be used
during compression. So you can trade compression effect for
CPU load and vice versa.
And be aware of compression here being used "on the fly",
thus it _can_ make sense to use less power than the absolute
maximum, as your server may have other duties as well.
Try to make your own tests: Take some /usr/bin/gzip and some
file, then compress it with "gzip -3 <file>" and a copy of
the file with "gzip -9 <copy_of_file>", and then compare the
results. (Values between 1 and 9 are valid - try every level
and learn which one has which effect.)
Higher levels provide better compression, but the higher you
go, the longer the process will take; use large files to
measure the cpu time used.
I myself have made some tests, and the gains of using higher
levels than 6 are really marginal, while the additional CPU
load can be rather big. I have also compared the results with
the one mod_gzip produces, and I found identical values for
"gzip -6" and mod_gzip (see below for a reason why).
As for the code involved, mod_deflate relies on a standard
zlib being installed, why mod_gzip ships with its own gzip-
ping code (written by Kevin Kiley).
Kevin told me that he decided to use the equivalent of gzip
level 6 as a "reasonable trade-off between CPU load and
compression effect") and didn't make this configurable.
Unfortunately, things are not that simple like having some
constant "6" inside the source code; if so, Christian Kruse
would have made this level configurable by an additional
directive in the meantime. (He has already tried doing so.)
On the other hand, mod_deflate allows you to explicitly set
this level:
http://httpd.apache.org/docs-2.0/mod/mod_deflate.html#deflatememlevel
Therefore I would have expected mod_gzip to compress a
little less efficient than mod_deflate using level 9.
If you use mod_deflate with level 6, then I believe you
would compare the two compression methods on a fair level.
But you have to use them on the same set of files ... and
the number of files for your test case should be large
enough to be significant.
(Maybe you run a script over your Apache logfile, extract
all the URLs, then run a program requesting all these files
via HTTP from two different Apaches that cover the same
file tree?)
While on-the-fly compression might give you the trade-off
between CPU load and traffic savings, static compression
should be done with the highest level possible, with the
idea of "compress once, serve everywhere"
So whether you use Apache's content negotiation to condi-
tionally serve statically precompressed versions of a file,
or whether you use mod_gzip 1.3.26.1a's feature of automati-
cally creating and updating these statically precompressed
files, you would always want to use "gzip -9".
Therefore it would even make sense to have a mod_gzip mode
for using the level 9 compression for statically precompres-
sed files, while a configuration directive to set the gzip
level for dynamic compression to level 3 might also make
sense if you have a server whose CPU is heavily loaded.
(Maybe one day we will understand Kevin's source code well
enough to make all this configurable ...)
As my own tiny compression cache solution ("gzip_cnc", a
Perl CGI script embedded as handler into Apache via the
"action" directive, can compress only static content) uses
the same concept as mod_gzip 1.3.26.1a (maintaining a cache
of precompressed file versions), I am using level 9 for this
one. But it really doesn't make that much of a difference,
compared with level 6.
On the other hand, I have stats about the gzip_cnc cache
hit rate for my own domain, and although I sometimes up-
date some of the files that are then being read by some
regular visitors, the "cache miss" rate (precompressed
file older than original file) is only in the 1% range.
Therefore using level 9 seems to be a good idea for caching
solutions.
Which might even include running Squid 2.5 as a caching
front end and some Apache 2.0 as back end, as Squid 2.5
is the first (known to me) caching proxy that correctly
handles content negotiation. You can store the compressed
and the uncompressed version in parallel in Squid 2.5 if
your compression module serves the proper "Vary:" headers,
and Squid 2.5 would then be able to serve the compressed
versions from its cache even when the Apache would serve
the compressed content only conditional (like being de-
pendent on the "User-Agent" header to prevent compressed
content being served to broken browsers like Netscape 4).
Therfore, a combination of Squid 2.5 and Apache 2.0 with
mod_deflate running on level 9 might possibly be the most
effective compression solution available in the Apache
area - if mod_deflate on level 9 really beats mod_gzip.
Regards, Michael