Rewriting URLs: Good for the Soul
The Caribbean property website I’ve been working on at Webdezign is nearing completion. Nothing to show quite yet, but I’ve very happy about it, and I got to flex my UI design skills in the administration section (the front-end design was ably undertaken by Peter). One of the multitude of new things I had to learn with this project was mod rewriting URLs using the APACHE rewrite module, which, even if I only used it in a limited way, is a very powerful tool. So I’ve written a little case study about what I did.
Part 1: Search Engine Friendly Property URLS
One of the most enjoyable (or challenging, depends on what day it was) tasks was creating a unique identifier for each property with using the ID. I took tips from a fellow programmer and from the Wordpress method and created a field for each property which contained what that property’s rewritten URL would look like. For this site, each property has a name, island, and accomodation type, so I used: island/accomodation/name, so for example:
st-lucia/villa/la-mer.
So instead of the URL for the property La Mer being found at www.site.com/property.php?id=3 which is all I’ve used so far, the pure unrewritten URL now read www.site.com/property.php?id=st-lucia/villa/la_mer. All the fun with checking for duplications every time a new property was added or edited aside, the next step was the rewriting, which in this case was nice and simple:
RewriteRule properties/(.*)/ property.php?prop=$1
This would be found in the root directory’s .htaccess file. This rule actually works both ways, i.e. it can start for the rewritten statement or the normal statement, depending on what the browser requests. I like to explain it as the rewritten one: The server detects when someone has gone to view a property at a page such as www.site.com/properties/st-lucia/villa/la-mer/ , and then kicks into action. The (.*) bit in the rule is the clever bit: this is a little bit of rewriting regex that is designed to catch everything and anything1, and it takes the information as the full string that is found between the forward slash at the end of properties and the last forward slash of the URL. The brackets mean that this value is to be used as a reference, and since there is only one set of brackets, this references to the number 1. And hey! Look at that, there is $1 over in the other statement. So the captured value is then set at the prop value in the second statement. Not all that difficult eh?
So for example, if someone put in this URL:
http://www.site.com/properties/barbados/hotel/john-smiths-house/
They would actually get sent here:
http://www.site.com/property.php?prop=barbados/hotel/john-smiths-house
Part 2: Content Managed Pages and Files
That was the gentle first step, the next hard bit in the specification was that they wanted stand-alone pages, which would have the URL structure www.site.com/page-name/. Ah okay, doesn’t sound to tricky, so I set up a little line like this in the .htaccess file:
RewriteRule ^(.*)/$ page.php?title=$1
Similar to before, but theres two new characters shown here. The ^ signifies a start of the string, and $ the end of the string. This stops the module trying to rewrite every URL with a trailing slash at the end, and only affects thos with one slash, and only have the base domain preceding the required part of the URL. So this worked nice and fine, but I wanted to also rewrite my search.php file as search/, and the error came blaring up all over my search section. Why? Because the rewrite rule was rewriting search/ as page.php?title=search. Not so good. So I turned to ye olde Wordpress and found exactly what I needed:
RewriteCond %{REQUEST_FILENAME} !-f
Brilliant! This goes before the rewrite rule, and basically says ‘if the name of the page you are rewriting actually matches a file, don’t rewrite it’. Problem solved. But what about for a directory, such as admin/. Well, this is not quite a simple one-line job like with files, you need to a line for every directory. Nice easy format, so here’s an example for the admin:
RewriteRule ^admin/.*$ - [PT]
So this means that anything (.* again) following admin/ will be completely ignored by the rewriting module.
Part 3: Multiple Variables on Island pages
Another interesting hurdle I found was when I tried to do pagination on a page that’s already being rewritten. I have individual pages on five specific islands running, which show all properties on those islands. The URL structure used is www.site.com/islands/island-name/, so for example, www.site.com/islands/barbados/. The rewrite rule used was:
RewriteRule islands/(.*)/ island.php?name=$1
Now to set the pagination in motion, I needed to put another variable to island.php called offset. I’m using a PHP pagination class that generates all the links and things with just a few input variables, but the problem was when I sent the variable to the URL as it was, errors came up because I was already rewriting based on the one $_GET variable ‘name’, so two messed it right up. And even then, I counlt just append ?offset=x at the end either for a similar reason, so it had to be written taking both into account. But technically I needed two rules, one for when there are pages, using the structure islands/islandname/page/pagenumber/, and one for when page-browsing hasn’t happened yet, i.e. only one $_GET variable. So, I used two rules:
RewriteRule islands/(.*)/page/(.*)$ island.php?name=$1&offset=$2
RewriteRule islands/(.*)/$ island.php?name=$1
The 2nd rule’s technicalities should be fairly familiar now, with the variable capture (.*) and the $ terminating the string. The 1st rule is similar, but this time we have two variables to capture, and each occurence is given a reference number, starting from 1 unlike most PHP iterations, which start from 0, so we get $1 and $2 in the rule in the second bit. Technically this could go on forever, as long as the numbers of bracketed statements in the first statement was equal to the number of variables in the second.
That’s all folks!
Hope you enjoyed me documenting my small foray into the intricacies of mod rewriting with PHP and Apache, I think I’m going to go find how the regex for my own genetic code because I can.
- To be be specific, the . means any character that is not a line-breaking character, and the * means as many characters matching the preceding statement as you like. So In theory, I could have used ([a-z][\-]*), which matches the charcters between a and z, and a hyphen, as many times as needed. But that may have been too clever. [back]
RSS Feed
Download vCard
Download CV
Comments
Interesting read, thanks for that
Good stuff. I have been meaning to do some rewriterules for something at work, and this has gotten me off my arse and made me do it. So much nicer than query strings
Post a comment