Configuring Flexible Domains and SubDomains in Apache 2.0

I spent a few hours tonight researching and testing different ways to configure apache2 on freebsd to allow me to use subdomains on my sites, with the following goals: the standard name-based virtualhost setup via NameVirtualHost followed by subsequent <Virtualhost> entries is not flexible enough to allow me to do everything i wanted. every domain would need its own entry... screw that.

there are many ways i found in apache2 to accomplish different subsets of these goals. apache's mod_vhost_alias is insanely easy to use, just load the module, set up one VirtualDocumentRoot command and you can easily automate domains, redirecting domain1.com to /your/web/root/domain1.com and domain2.com to /your/web/root/domain2.com ad naseum. however, mod_vhost_alias is rather inflexible when it comes to subdomains. firstly, it splits domains by their dots, so while it's easy to specify a simple subdomain path for subdomain.domain.com, more complex subdomains like sub.do.main.domain.com are more difficult... there may be a way to accomplish this, but it wasn't readily apparent to me. also, i found no way to have the subdomain default to www... which means if your webroot is /web and your domain is domain.com, then www.domain.com points to /web/domain.com/www and domain.com points to /web/domain.com/. usually you can finagle something with links on a *nix system, but you can't have /web/domain.com/ point to /web/domain.com/www, so you would have to link vice versa... however then you have a link linking to it's parent... so where does www.domain.com/www go?

so, it came down to mod_rewrite's RewriteRules, which are almost as notorious for their power as for their confusing syntax. however, they are extremely flexible.

it took me about 2 hours to find the right combination of RewriteRules to do what i wanted, and here's what I ended up with:

$ tail -n 5 /usr/local/etc/apache2/httpd.conf
RewriteEngine On
RewriteMap sub prg:/usr/local/etc/apache2/sub.pl
RewriteCond %{HTTP_HOST} ^(?:(.*?)\.)?([0-9a-z_-]+\.[a-z]+)$ [NC]
RewriteRule ^/(.*)$ /virtualhosts/%2/${sub:%1|www}/$1 [L]

$ cat /usr/local/etc/apache2/sub.pl
#!/usr/bin/perl
$|++;   # flush output, program will fail w/o this
# simply return what we're fed, or "NULL\n" if input is empty
print ($_ eq "\n" ? "NULL\n" : $_) while <>;

Explanation:

RewriteEngine On

Pretty straightforward -- enables RewriteRules

RewriteMap sub prg:/usr/local/etc/apache2/sub.pl

This line defines "sub" as the name of a function i can call later with ${sub:some_argument} syntax, prg:/path/to/file tells apache that when i call the "sub" function, it should consult the program at /usr/local/etc/apache2/sub.pl to process any arguments passed

RewriteCond %{HTTP_HOST} ^(?:(.*?)\.)?([0-9a-z_-]+\.[a-z]+)$ [NC]

This line basically just matches different parts of the hostname, the meat of the domain at the end, and any possible (optional) subdomains in the beginning

RewriteRule ^/(.*)$ /virtualhosts/%2/${sub:%1|www}/$1 [L]

The meat of the whole thing -- it matches the full path of the requested URL, and then, using the matches from the RewriteCond above it (%1 and %2, which are the subdomain and domain data, respectively), along with its own match ($1, the path data), constructs the destination url. the ${sub:%1|www} calls the sub function via the perl script we defined earlier, passing it the contents of the subdomain match in %1, which could be "www", "some.random.sub.domain" or empty, passes it to the perl script, which echoes back the special string "NULL" if the match is empty or the full match otherwise. if "NULL" comes back, then apache falls back to the "www" after the | as a default. finally, the path data in $1 is appended to the end.

Here are some examples:

Original request: http://www.domain.com/script.php?a=1&b=2
%1 is "www"
%2 is "domain.com"
$1 is "script.php?a=1&b=2"
Page gets served internally as: /virtualhosts/domain.com/www/script.php?a=1&b=2

Original request: http://a.b.c.domain.com
%1 is "a.b.c"
%2 is "domain.com"
$1 is empty
Page gets served internally as: /virtualhosts/domain.com/a.b.c/

Original request: http://domain.com/dir/page.html
%1 is empty (and defaults to "www")
%2 is "domain.com"
$1 is "dir/page.html"
Page gets served internally as: /virtualhosts/domain.com/www/dir/page1.html

Possible Problems

If the domains automatically generate paths, it's possible for apache to calculate a path that doesn't exist
Yup, sure is. put in wer3r232.parseerror.com. fortunately, it will register as a simple 404, just like if you typed ww.parseerror.com by mistake.
Since the path is automatically generated, is it possible to access files outside of the webroot by crafting a special subdomain?
Honestly I'm not sure. I'm fairly confident that with proper security settings (mainly telling apache that everything outside of the DocumentRoot is off-limits) should keep people from being able to take a look at your system. Also, since in order to craft a path on a *nix machine you'd need periods (.) and slashes (/), neither of which go well in a subdomain (slash is an illegal character, and more than one consecutive period in a domain is invalid syntax, as is any possible escape sequence, like using %2F (the url encoded translation of a slash)). nevertheless, there is always a possibility

Ramble

I am not an apache genius nor do I know if this will even do everything I think it will. Do not blindly trust what I've done here. However, I did want to make this information available to anyone who is looking to do the same thing as me, because I've looked for myself and couldn't find any documents describing this process. Maybe there's a reason for that... I just don't know why.
Thu Feb 5 00:59:03 CST 2004