Antwort: search engines [was Re: [Mod_gzip] Mod Gzip]

mod_gzip@lists.over.net mod_gzip@lists.over.net
Tue, 21 Jan 2003 01:09:31 +0200


Hi Terry,


> Are gzipped pages still easily spidered by search
> engines?  Is it true that mod_gzip will detect that
> these agents don't have compression and will therefore
> send the normal uncompressed content to them?

a spider is no different from a browser, from the
server's point of view.
Actually, all the server can see is a set of HTTP
headers, and even those may lie about the identity
of the client.

So all mod_gzip is able to rely upon is whether the
client is sending an "Accept-Encoding: gzip" header
or not. If there were a spider actually capable of
communicating gzip encoded content, then mod_gzip
would serve compressed content to it - regardless
of the "true nature" of the client. (Okay, I over-
simplified here, there are configuration rules ...)

I remember having had a mail contact with some
Google technical engineer about which browsers
support gzipped content. I asked him why their
spider didn't request for gzipped pages, while
their server does send gzipped content if appro-
priate ... their spider would use less bandwidth,
thus being able to collect page content faster.
He replied that he would forward the idea to the
spider programmers ... maybe there is someone here
who can find a
     "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
in his mod_gzip enhanced log file?

Regards, Michael