An excursion in mod_rewrite

While setting up my blosxom blog's apache vhost to rewrite URL:s to pass them by the blosxom CGI script, I happened to set up a redirection loop with apache's mod_rewrite module - by accident - and realized that mod_rewrite might be Turing complete. (Always my instinct when something has the ability to loop.) After some more thinking, I realized that this apache module really is a bona fide String rewriting system which can be used to implement both Tag systems and Cellular automata, both of which can be Turing complete. Actually, since mod_rewrite can implement any tag system or cellular automaton, mod_rewrite is Turing complete.

So, if one felt like it, mod_rewrite can be configured with a Turing complete set of rewrite rules such that each accessed URL is a program, and the final URL is the out-data (or a link to a CGI-script that gives back the final output to the user, with suitable decoding). It's possible (trivial, even) to add a first rewrite rule that bootstraps the rewriting with an interpreter string. It's theoretically possible to have this be something like a Haskell interpreter - imagine the possibilities! - a dog-slow Haskell interpreter implemented in an apache config file!

Here's a proof-of-concept implementation of a rewrite system framework in mod_rewrite. All that's missing is a set of rewrite rules to implement something interesting, like 99-bottles-of-beer, bon-digi or a Universal Turing Machine. Note that apache by default limits the number of "internal recursions" to 10, and with any serious use of this you're likely to hit that limit. Increase as needed.

# This file should be included in an Apache2 config file for a VirtualHost.

# Enable rewriting
RewriteEngine on

# First off, set a sensible log level so that you can see what goes wrong
RewriteLogLevel 3
RewriteLog "/var/log/apache2/rewrite/rewrite.log"

# This should only run on the first inputed string to add the interpreter.
# After each run of substitutions, we return to the first rule, so this must
# be safe and idempotent. So, we add the interpeter prefixed by '-' if the
# string does not start with '-'. (This could also be used for input programs
# to replace the interpreter if they like. We could also simply require input
# programs to come with their own interpreters and do nothing here.)
RewriteRule ^([^-].*)$ -INTERPRETER$1

# Add a comma to distinguish unchanged strings. This would be the termination
# condition for our rewrite system. Of course there are other ways to do this,
# for example a regexp which evaluates into the terminate-rule at the bottom.
# Doesn't need to be a comma, of course. If this character is part of the input
# alphabet, it could confuse some things though.
RewriteRule ^(.*)$ ,$1

# This is where the implementation of the entire string rewriting system would
# be. The general format is ,xyz -> xy'z - remove the first comma to mark the
# string as matched and changed (if there was a comma to start with), then
# change the string y to y' (with x and z left unchanged).
RewriteRule ^,?(.*)foo(.*)$ $1bar$2
# ... etc until your rewrite rules implement some turing-complete language, or
# a program that does something interesting.

# If the string has changed, we are not yet done. Loop to the beginning and
# remove the changed-marker. This doesn't change the string at all ('-' is a
# special substitution string that simple uses the original string), just has
# the [N] flag to trigger apache to restart rewriting.
RewriteRule ^[^,].*$ - [N]

# If the string is still unchanged, terminate and apply output-rewriting.

# First, remove the comma
RewriteRule ^,(.*)$ $1 
# Since the example string INTERPRETER is not actually interpreted in this
# example, remove it after applying the rewrite rules.
RewriteRule ^-INTERPRETER(.*)$ $1
# Finally, pass the resulting string to a simple CGI or PHP script to print it
# to the web browser. [L] means to abort rewriting here.
RewriteRule ^(.*)$ /print.php?$1 [L]

No responses so far