2010-07-07

Out-of-Band Referer For the Savvy Server

Tracking URL history via the URL itself

Websites love to track their users. Hits are a site's bread and butter and it's important to know who,what,when,where,how,why. Browser requests include a Referer header telling the server where it came from:

HTTP/1.1 GET /
Referer: http://www.example.com/

But Referer is only sent when a user clicks on a link from another page within the browser. The first page loaded in the browser has no Referer nor does any link that is copy-and-pasted or entered manually.

In the case of links being shared between people via email, instant messaging and chat rooms it would be useful for sites to know who is doing all the sharing.

A possible solution exists: encode a unique identifer within the link itself. The obvious way is a dedicated token at the end of the URL, but more clever and subtle approaches are also possible, such as encoding information via subdomain prefix or path component (via token or even via string case permutation); these can be normalized via a RewriteRule on the server side.

Furthermore, the implementation of such a scheme to identify out-of-band Referers means that now any further Referer-less/tokenless hits are likely manual entry hits (other possibilities do exist).

Furthermore, each uniquely-encoded link could further encode its parent link, meaning one could track an entire list (or tree) of out-of-band referrers.