[Mod_gzip] Need some exclude syntax help

mod_gzip@lists.over.net mod_gzip@lists.over.net
Tue, 9 Dec 2003 23:23:57 EST


--part1_3a.4236379e.2d07f9dd_boundary
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit


Hi all...
This is Kevin Kiley...

Christian is right...

At the moment when mod_gzip is doing it's FIRST PHASE
include/exclude checking... the Apache Server is executing
something called the 'type checker' hook and there are 2
different Apache pointers to the inbound request headers 
that relate to the METHOD line in the original request.

( The original inbound buffer is GONE by this point... it's been
parsed to pieces by Apache core code ).

r->uri  ( The path/filename WITHOUT any QUERY parameters )
r->unparsed_uri  ( Original request line + query parms but minus domain name 
)

When you use 'item include uri' it is referencing the 'r->uri' pointer
and NOT the 'r->unparsed_uri' pointer so the regular expression
check isn't going to 'see' any QUERY parms that may have been
on the end of the GET request line.

The original version(s) of mod_gzip would, in fact, allow you to
use 'grep' expressions on 'r->unparsed_uri'.

It could be done with this simple config line...

mod_gzip_item exclude reqheader 0: *some_query_parm*

The ZERO + COLON was a special indicator that meant
'apply this regular expression to the entire first line of
the inbound request header... query parms included'.
There is no actual 'field name' for the first line of an
HTTP request ( unless you count http: itself but that's
not always what might be on the front of the line )
so the ZERO + COLON was just a way to tell 
mod_gzip to use that first line for the 'grep' style
expression search.

When mod_gzip switched to using Apache's own 'parms' table
this 'first line' is no longer available. Only the 'r->unparsed_uri'
comes close and even it is not the actual 'first line' that
arrived. It's also been parsed already.

I suppose we could bring back...

mod_gzip_item exclude reqheader 0: *some_query_parm*

...and let people include/exclude based on query parameters
in the request line... but there's only been a few times when
this was even an issue in the last 3 or 4 years... and there
is ALWAYS a workaround for those people.

* WORKAROUND

Don't forget that any back-end server or CGI script ALWAYS
has the ability to 'tell' mod_gzip what it's supposed to do.

The moment there are QUERY parameters on the GET
line itself one can only assume that this is not just a 
static file request and some back-end script is GOING
to get called to handle this request. 

That being the case... it's ALWAYS easy for the back-end
script to tell mod_gzip whether or not it should compress
something.

The only thing to realize here is that the 'include/exclude'
decision will be happening in PHASE TWO of mod_gzip's
decision making process and not in PHASE ONE ( as
described above ).

All you have to do over in your script is when YOU decide
that the output should NOT be comressed just add a
special header to the response that tells mod_gzip 
to 'back off' even if it thinks the response is supposed
to be compressed.

I would think this is where the decision making process
SHOULD be for these kinds of things. Anytime the 
decision to compress ( or not ) is based on the value
of a QUERY parameter then it is the back-end script
handler that SHOULD make the decision. Makes
sense to me.

So anytime you 'decide' that some response from your
back-end CGI should NOT be compressed... just add
a response header that looks something like this...

X-do-not-compress-this: dummy_value

...or whatever. That's a little absurd but whatever you use
just be aware that you are TELLING mod_gzip something.
It can only be a few characters as long as mod_gzip 
knows what to look for.

Using the example above... all you would have to add
to your mod_gzip configuration is the following...

mod_gzip_item_exclude rspheader X-do-not-compress-this: *

Voila.

Anytime mod_gzip sees a response header 'X-do-not-compress-this'
then it will do just that... it will NOT compress the response.

The value of the field doesn't matter as long as you just use
STAR as the search criteria and there is some 'dummy_value'
after the colon. It's the FIELD NAME itself that is going to
be recognized during mod_gzip's PHASE TWO decision
making process and used as a 'switch' to turn compression
on/off for any particular response.

mod_gzip nevers starts compressing anything until AFTER
it has applied all the include/exclude rules to the 
RESPONSE header as well as the REQUEST header.
That's why you can just 'tell' mod_gzip what to do from
any back-end script.

So the back-end script is now completely in charge of what 
gets compressed and what doesn't. Problem solved for 
anyone using QUERY parms for compression decision(s).

Later...
Kevin

PS: None of this is even remotely possible with the
mod_deflate that comes with Apache 2.0. It has no
such ability to include/exclude things like mod_gzip can/does.



In a message dated 12/9/2003 1:27:10 PM Central Standard Time, 
ckruse@wwwtech.de writes:


> Hi Michael,
> 
> On Tue, 9 Dec 2003 18:55:20 +0100 you wrote:
> 
> > "mod_gzip_item_exclude file" won't help you, and I am not sure whether
> > "mod_gzip_item_exclude uri" would see the query_string - this could be
> > removed already by some Apache API handling.
> 
> Unfortunately this is the case. Apache uses a request record structure,
> in which are a field 'uri' and 'unparsed_uri'. 'uri' contains the URI of
> the requested document _without_ any query string while unparsed_uri
> contains the hole, unparsed URI. There are no checks against
> unparsed_uri.
> 
> Greetings,
> CK
> 
> -- 
> Your reasoning powers are good, and you are a fairly good planner.
> 


--part1_3a.4236379e.2d07f9dd_boundary
Content-Type: text/html; charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable

<HTML><FONT FACE=3Darial,helvetica><FONT  SIZE=3D2><BR>
Hi all...<BR>
This is Kevin Kiley...<BR>
<BR>
Christian is right...<BR>
<BR>
At the moment when mod_gzip is doing it's FIRST PHASE<BR>
include/exclude checking... the Apache Server is executing<BR>
something called the 'type checker' hook and there are 2<BR>
different Apache pointers to the inbound request headers <BR>
that relate to the METHOD line in the original request.<BR>
<BR>
( The original inbound buffer is GONE by this point... it's been<BR>
parsed to pieces by Apache core code ).<BR>
<BR>
r-&gt;uri&nbsp; ( The path/filename WITHOUT any QUERY parameters )<BR>
r-&gt;unparsed_uri&nbsp; ( Original request line + query parms but minus dom=
ain name )<BR>
<BR>
When you use 'item include uri' it is referencing the 'r-&gt;uri' pointer<BR=
>
and NOT the 'r-&gt;unparsed_uri' pointer so the regular expression<BR>
check isn't going to 'see' any QUERY parms that may have been<BR>
on the end of the GET request line.<BR>
<BR>
The original version(s) of mod_gzip would, in fact, allow you to<BR>
use 'grep' expressions on 'r-&gt;unparsed_uri'.<BR>
<BR>
It could be done with this simple config line...<BR>
<BR>
mod_gzip_item exclude reqheader 0: *some_query_parm*<BR>
<BR>
The ZERO + COLON was a special indicator that meant<BR>
'apply this regular expression to the entire first line of<BR>
the inbound request header... query parms included'.<BR>
There is no actual 'field name' for the first line of an<BR>
HTTP request ( unless you count http: itself but that's<BR>
not always what might be on the front of the line )<BR>
so the ZERO + COLON was just a way to tell <BR>
mod_gzip to use that first line for the 'grep' style<BR>
expression search.<BR>
<BR>
When mod_gzip switched to using Apache's own 'parms' table<BR>
this 'first line' is no longer available. Only the 'r-&gt;unparsed_uri'<BR>
comes close and even it is not the actual 'first line' that<BR>
arrived. It's also been parsed already.<BR>
<BR>
I suppose we could bring back...<BR>
<BR>
mod_gzip_item exclude reqheader 0: *some_query_parm*<BR>
<BR>
...and let people include/exclude based on query parameters<BR>
in the request line... but there's only been a few times when<BR>
this was even an issue in the last 3 or 4 years... and there<BR>
is ALWAYS a workaround for those people.<BR>
<BR>
* WORKAROUND<BR>
<BR>
Don't forget that any back-end server or CGI script ALWAYS<BR>
has the ability to 'tell' mod_gzip what it's supposed to do.<BR>
<BR>
The moment there are QUERY parameters on the GET<BR>
line itself one can only assume that this is not just a <BR>
static file request and some back-end script is GOING<BR>
to get called to handle this request. <BR>
<BR>
That being the case... it's ALWAYS easy for the back-end<BR>
script to tell mod_gzip whether or not it should compress<BR>
something.<BR>
<BR>
The only thing to realize here is that the 'include/exclude'<BR>
decision will be happening in PHASE TWO of mod_gzip's<BR>
decision making process and not in PHASE ONE ( as<BR>
described above ).<BR>
<BR>
All you have to do over in your script is when YOU decide<BR>
that the output should NOT be comressed just add a<BR>
special header to the response that tells mod_gzip <BR>
to 'back off' even if it thinks the response is supposed<BR>
to be compressed.<BR>
<BR>
I would think this is where the decision making process<BR>
SHOULD be for these kinds of things. Anytime the <BR>
decision to compress ( or not ) is based on the value<BR>
of a QUERY parameter then it is the back-end script<BR>
handler that SHOULD make the decision. Makes<BR>
sense to me.<BR>
<BR>
So anytime you 'decide' that some response from your<BR>
back-end CGI should NOT be compressed... just add<BR>
a response header that looks something like this...<BR>
<BR>
X-do-not-compress-this: dummy_value<BR>
<BR>
...or whatever. That's a little absurd but whatever you use<BR>
just be aware that you are TELLING mod_gzip something.<BR>
It can only be a few characters as long as mod_gzip <BR>
knows what to look for.<BR>
<BR>
Using the example above... all you would have to add<BR>
to your mod_gzip configuration is the following...<BR>
<BR>
mod_gzip_item_exclude rspheader X-do-not-compress-this: *<BR>
<BR>
Voila.<BR>
<BR>
Anytime mod_gzip sees a response header 'X-do-not-compress-this'<BR>
then it will do just that... it will NOT compress the response.<BR>
<BR>
The value of the field doesn't matter as long as you just use<BR>
STAR as the search criteria and there is some 'dummy_value'<BR>
after the colon. It's the FIELD NAME itself that is going to<BR>
be recognized during mod_gzip's PHASE TWO decision<BR>
making process and used as a 'switch' to turn compression<BR>
on/off for any particular response.<BR>
<BR>
mod_gzip nevers starts compressing anything until AFTER<BR>
it has applied all the include/exclude rules to the <BR>
RESPONSE header as well as the REQUEST header.<BR>
That's why you can just 'tell' mod_gzip what to do from<BR>
any back-end script.<BR>
<BR>
So the back-end script is now completely in charge of what <BR>
gets compressed and what doesn't. Problem solved for <BR>
anyone using QUERY parms for compression decision(s).<BR>
<BR>
Later...<BR>
Kevin<BR>
<BR>
PS: None of this is even remotely possible with the<BR>
mod_deflate that comes with Apache 2.0. It has no<BR>
such ability to include/exclude things like mod_gzip can/does.<BR>
<BR>
<BR>
<BR>
In a message dated 12/9/2003 1:27:10 PM Central Standard Time, ckruse@wwwtec=
h.de writes:<BR>
<BR>
<BR>
<BLOCKQUOTE TYPE=3DCITE style=3D"BORDER-LEFT: #0000ff 2px solid; MARGIN-LEFT=
: 5px; MARGIN-RIGHT: 0px; PADDING-LEFT: 5px">Hi Michael,<BR>
<BR>
On Tue, 9 Dec 2003 18:55:20 +0100 you wrote:<BR>
<BR>
&gt; "mod_gzip_item_exclude file" won't help you, and I am not sure whether<=
BR>
&gt; "mod_gzip_item_exclude uri" would see the query_string - this could be<=
BR>
&gt; removed already by some Apache API handling.<BR>
<BR>
Unfortunately this is the case. Apache uses a request record structure,<BR>
in which are a field 'uri' and 'unparsed_uri'. 'uri' contains the URI of<BR>
the requested document _without_ any query string while unparsed_uri<BR>
contains the hole, unparsed URI. There are no checks against<BR>
unparsed_uri.<BR>
<BR>
Greetings,<BR>
CK<BR>
<BR>
-- <BR>
Your reasoning powers are good, and you are a fairly good planner.<BR>
</BLOCKQUOTE><BR>
<BR>
</FONT></HTML>
--part1_3a.4236379e.2d07f9dd_boundary--