This page describes how to use Apache's mod_rewrite module with WebKit, and mod_webkit in particular. It starts with a general overview and then presents some recipes for solving specific problems.
Related reading: mod_rewrite reference docs, mod_rewrite guide
mod_rewrite is an Apache module that is used to remap URLs. Here's how the Apache manual describes it:
This module uses a rule-based rewriting engine (based on a regular-expression parser) to rewrite requested URLs on the fly. It supports an unlimited number of rules and an unlimited number of attached rule conditions for each rule to provide a really flexible and powerful URL manipulation mechanism. The URL manipulations can depend on various tests, for instance server variables, environment variables, HTTP headers, time stamps and even external database lookups in various formats can be used to achieve a really granular URL matching.
It is an incredibly flexible tool that can be used for a wide range of tasks. Here's some examples:
change the file layout of your site without breaking bookmarked or search-engine indexed URLs
make a certain directory or file visible in every subdirectory of your site without needing to symlink to it in every subdir or manage complex relative paths.
use mod_webkit to serve files that appear to the user as if they are in Apache's top directory ('DOCUMENT_ROOT'). Without mod_rewrite, each URL would look like www.mysite.com/WK/MyServlet.py. With mod_rewrite, you can use a simpler URL like www.mysite.com/MyServlet.py
redirect to files on remote servers
parsing 'variable assignment url prefixes' out of URLs before passing them on to webkit and adding the variables to the cgi ENVIRONMENT dictionary (trans.request().environ())
To use mod_rewrite you need to compile it with Apache and then enable it in your httpd.conf configuration file. See the apache docs for compilation instructions. To enable it in your httpd.conf file, uncomment or add the following lines to the DSO section in part 1:
LoadModule rewrite_module modules/mod_rewrite.so AddModule mod_rewrite.c
-- TavisRudd - 05 Mar 2002
To use ModWebKit, you'll usually have some lines like this in your http.conf:
<Location /WK> WKServer localhost 8086 SetHandler webkit-handler </Location>
Then /WK/Servlet will run Servlet.py in the default context.
The problem with this is that you now have an implementation dependent fragment "WK" (standing for "WebKit") in your application URLs. This is considered bad for various reasons. Ideally, the URL should only reflect the content and be human readable (you might want to read the passage about The right URL for your link from the Tutorial The Care and Feeding of Hyperlinks). You cannot simply omit the Location "/WK" or replace it by "/" if you want everything to be served through WebKit (I don't know why, but it won't work). However, you could replace "WK" by a memorable and reasonable keyword pointing to the application itself, not the technology behind it.
The more sophisticated solution is to map the URL using the URL manipulation capabilities of Apache provided by mod_rewrite. Perhaps you want to map different URLs to different contexts, or even to the root of the website:
RewriteEngine On RewriteRule ^(.*) /WK/LocalSite/$1 [L,PT] ## NOTE: put no space between L and PT!
The L flag means "this the LAST rule, don't apply any more" and PT means "PASS-THROUGH the changed URI to the next handler." You can leave the L off, but the PT is essential if you are using mod_webkit. See the mod_rewrite reference guide for an explanation.
To redirect all images to a static location, so that there's no overhead from calling Webware, you would put this rewrite directive before the above:
RewriteRule ^(.*\.(jpg|gif|png))$ /images/$1 [L,PT]
For a virtual domain, do something like:
<VirtualHost bobsyouruncle.com> ... RewriteEngine On RewriteRule ^(.*)$ /WK/BobsYourUncle/$1 [L,PT] </VirtualHost>
This assumes /WK (not virtual) is where the AppServer is located, and that you've created a context BobsYourUncle for that virtual domain.
What if you aren't using mod_webkit? Say you are using a CGI adapter at http://mainhost.com/cgi-bin/WebKit.cgi ? Then just replace /WK with /cgi-bin/WebKit.cgi in all these examples. _(can someone confirm this for virtual domains?)_ You can use [L] instead of [L,PT] if you aren't using mod_webkit. _(Do you need [L,PT] if you are using mod_snake, etc?)_
-- IanBicking ... with changes by TavisRudd - 05 Mar 2002
If the image location is the same as the original URL, use a hyphen as the second argument. You don't need $1's ()s in this case:
RewriteRule ^.*\.(jpg|gif|png)$ - [L,PT]
-- MikeOrr - 10 Dec 2001
I had two problems. The first was that I put my httpd.conf rewrite rules in a <Directory> such as / or the doc root. This gave either a repeating path that was HTTP Forbidden, or gave no response. The solution was to put the rules above the first <Directory>.
The second problem was that if I put a slash in front of $1, the path could not be found.
Finally, I was able to remove the ^ and $, which aren't needed to capture the whole path. My final solution was:
... DocumentRoot "/usr/local/apache/htdocs" RewriteEngine On RewriteRule (.*) /webkit/ContextName$1 [L,PT] <Directory /> ...
I used Apache 1.3.22 (built from source) on Mandrake Linux 8.1.
-- ChuckEsterbrook - 22 Jan 2002
I used .htaccess files to set the rewrite rule, and had the following rules as above:
RewriteEngine On RewriteRule ^(.*\.(jpg|jpeg|gif|png)) /images/$1 [L,PT] RewriteRule ^(.*) /cgi/OneShot.cgi/MyContext/$1 [L,PT]
This caused the recursion mentioned above in another case, to solve this I went to /cgi/.htaccess and placed there:
RewriteEngine Off
This solved it. Now I can get to actually try to build my application.
-- BaruchEven - 17 Feb 2002
To clarify, if you want Webware to serve URLs in the top-level directory (/index.py) without a /WW prefix AND serve certain files statically:
1 RewriteRule ^/images(.*) - [L] 2 RewriteRule ^/pix/(images|thumbs)/(.*) - [L] 3 RewriteRule ^/WW($|/.*) - [L] 4 RewriteRule ^(.*) /WW/$1 [L,PT]
(The line numbers are not in the Apache file.)
(1) and (2) rewrite /images, /pix/images and /pix/thumbs to their original URLs ("-") and use "L" to prevent rule 4 from executing.
(3) rewrites URLs with the WebKit prefix to their original URLs, and uses "L" to prevent 4 from executing, which would wrongly rewrite /WW/index.py to /WW/WW/index.py
(4) does most of the work, rewriting /index.py to /WW/index.py so that users don't have to type the /WW prefix. "PT" is necessary or the request won't be passed properly to WebKit.
I don't know if it's possible to make WebKit serve directly from / using <Location /> AND serve some files statically. If it were possible, you'd need to supercede the "SetHandler webkit-handler", thus:
<Location /WW> WKServer localhost 8086 SetHandler webkit-handler </Location> <Location /images> SetHandler apache-core-default-handler </Location>
but I don't know how you set the handler back to the default once you've set it to something else.
-- MikeOrr - 16 Apr 2002
My server is hosting 3 unrelated URLs. Each URL resolves to a virtual host. One of these virtual hosts is used as the example.
The Apache config for the Virtual Server includes both the Location directive as well as the RewriteRule directives. This allows for different configs for each of the three virtual hosts.
The RewriteRules accomplish two things.
One, the third rule adds the "/wk" to the start of the URL passed to Apache. This makes Webware happy. However, the first rule ensures that a "/wk" does not already exist in the URL, as two "/wk" values would be a problem.
Two, the second rule sends request for images used by mywebcontext to the proper non-context directory. The "[L,PT]" on both the second and third rules allows only one of those rules to be processed.
Given the server directory structure:
/pub /httpd /myweb /mywebcontext /mywebimages
The Webware context is:
'Contexts': {'mywebcontext':'/pub/httpd/myweb/mywebcontext', 'default':'mywebcontext'}
The Apache config is:
<VirtualHost *> DocumentRoot /pub/httpd/myweb ServerName www.e-myweb.com # # Added For WebWare / WebKit Support # RewriteEngine On RewriteRule ^/wk/(.*) /$1 [R] RewriteRule ^/images/(.*) /mywebimages/$1 [L,PT] RewriteRule ^/(.*) /wk/$1 [L,PT] <Location /wk> SetHandler python-program # add the directory that contains ModPythonAdapter.py PythonPath "sys.path+['/usr/local/Webware/WebKit']" PythonOption AppWorkDir /pub/httpd/myweb PythonHandler ModPythonAdapter PythonDebug On </Location> </VirtualHost>
-- TacticalJack - 04 Sep 2002
I use this trick:
<Location /> WKServer localhost SetHandler webkit-handler </Location> <Location /_> WKServer localhost SetHandler webkit-handler </Location> RewriteEngine On RewriteRule $^ /_/ [L,NS]
This makes WebKit serve everything including the root folder.
Another one is:
RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_FILENAME} -F RewriteRule ^(.*) - [L]
This tells apache to see if it has the file to serve for the path requested, then check it's not a directory and then serve it, else webkit handles request as ususal. That's a very nice thing to have!
This enables you to have files like css and favicon in the root of the site, yet in a different directory than code (in htdocs or something) - and no special cases are needed for different file types and stuff. Unlike the recipes above.
Another possible use is to generate static cached pages for paths that give the most load on your appserver (don't forget to delete them someday after).
Make sure to have this for the root folder:
Options -Indexes
Else it will spit out directory listing instead of letting webkit respond - my apache believes root folder is that special that if it can be listed it passes "RewriteCond %{REQUEST_FILENAME} -F" test positive and serves it even it really being a folder. Maybe a misconfiguration of mine.
--maluke http://maluke.com/ 17 Sep 2004