One topic that continues to mystify some developers is the .htaccess file and accompanying mod_rewrite module for Apache HTTP Server. The module, which is used for URL rewriting, can contain complicated regular expression rules and very powerful configuration directives. Incoming requests for Web pages can be translated into URLs that make sense to the user, and that are considered SEO friendly. It is often argued that the more keyword rich and “friendly” the URL, the better optimized it is for search engines.
However, the process by which URLs are rewritten has changed significantly with the advent of PHP application frameworks. The vast majority of these frameworks are built upon the MVC architecture design paradigm. Now, not only does content get associated with a URL, but components of the application are associated with pieces of the URL. Almost all of the magic that transforms the URL path into a viewable page is hidden deep inside the code. This can be difficult to understand at first, but getting a grip on some basic concepts will help you to hack even the most intricate URL rewriting schemes.
The Front Controller
The Front Controller is a software design concept. When a Web application is using a front controller, all Web page requests are funneled through a single point of entry as instructed by the .htaccess. In PHP, the front controller often sits in the Web root, and is named index.php. No matter what URL you type into the address bar, if the application is using a front controller, the request will likely hit the index.php file. This is not unique to PHP or MVC. For instance, Spring, a Java MVC framework, uses a front controller. Even Wordpress, which is not MVC, also uses a front controller.
It should be mentioned that a front controller and SEO friendly URLs are not always synonymous with one another. Yet, the two do have a close relationship. The majority of modern PHP applications instead store a URL path reference in the database, and use it to retrieve page specific data. This is actually the preferred method for mapping URLs, since regulating rewrite rules all in a single .htaccess file can be complicated. If you want to give an admin the ability to modify a URL path (alias) for a page, then it has to be stored somewhere.
Beginner Front Controller: Let .htaccess Shine
There is some debate as to whether or not this is actually a front controller design, since no index.php file is utilized. However, I thought it would be worthwhile to demonstrate how URL rewrites work at the highest level before moving onto advanced solutions. Remember, this is not the preferred approach, but it is a good primer. The following is an example of one-to-one relationship URL rewrites in an .htaccess. If you have less than a dozen pages on a small Web site, this can be a useful quick fix.
<ifmodule mod_rewrite.c> RewriteEngine On RewriteRule ^our-services/strategic-consulting$ /our_services.php?id=1 [L] RewriteRule ^our-services/web-design$ /our_services.php?id=2 [L] RewriteRule ^our-services/web-development$ /our_services.php?id=3 [L] </ifmodule>
The IfModule conditional directive wraps a section of rewrite rules, and ensures that they will only get executed on a server that has mod_rewrite capabilities. If the module did not exist, and there was no conditional directive, users will receive an HTTP Error 500 – Internal Server Error. It is good practice to always use this directive.
According to the documentation, “The RewriteEngine directive enables or disables the runtime rewriting engine.” RewriteEngine can be declared multiple times within .htaccess, which can actually help with debugging. The directive can be turned on or off, and instead of commenting out or deleting a line during testing, it is best to use RewriteEngine Off. In the example there is only the need to turn the rewrite engine on once.
The RewriteRule directive is one of the most powerful aspects of mod_rewrite. I invite you to bookmark and eventually read the official documentation for RewriteRule. There are too many facets to this directive to cover them all, so I will mention just a few.
- RewriteRule uses pattern substitution through regular expressions. So, the more you know about regular expressions, the more powerful your rewrite rules can become.
- Each rule begins with RewriteRule, and multiple rules can be applied to a single URL string before the page request is passed along to a PHP (or other) file.
- Definition order is important. Top down is the order of execution. You cannot skip rules unless they are commented out.
- The first portion of a rule is the pattern to match. The caret anchor (^) designates the beginning of a pattern match (start of line). The dollar anchor ($) designates the end of a pattern match (end of line). (This is the URL that the user typed into the address bar.)
- The second portion of the rule designates the file path that should actually handle the request behind the scenes if, and only if, a pattern match was found in the first portion of the rule. You can use regular expression groups and backreferences to pass information from the first portion of the URL to the second portion of the URL. More on that later.
- The final portion of the rule is a flag. It provides additional flexibility without the use of further pattern substitutions. The flag in the example above is “last rule”, and states that if a match was found, no further rules should be processed. The request is then forwarded on to the application. Multiple flags can be separated by a comma, all of them contained between a single set of brackets.
Intermediate Front Controller: A Wordpress Hack
Now that you are familiar with .htaccess and the front controller, it is time to move on to something a bit more useful — a Wordpress hack. One of the common complaints among Wordpress users and developers, is that it is difficult to add pages that contain Wordpress features and functions, but that can also be fully customized programmatically. For our purpose, we are going to add a product page to Wordpress. The intention is to get you started, but not to provide a fully featured e-commerce solution.
The following is the .htaccess file that is included after Wordpress is installed, and when SEO friendly URLs are enabled:
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
The first thing to note is that you should never edit anything between the #BEGIN WordPress and #END WordPress comments. If you make changes to how Wordpress formats your URLs in the admin, your changes to the .htaccess will be overwritten. Also, since Wordpress implements a front controller, you must do all of your rewriting before the first Wordpress comment. This allows your custom rules to be processed uninterrupted.
Before continuing, I do want to highlight the RewriteCond directives above because they are so useful. They basically state that if a request is made for any file or directory that physically exists in the file system, serve that up instead. This allows users to download physical files, like a PDF or ZIP, and still take advantage of the front controller. If you had another application in another folder, you would also want that to continue to function without being hijacked by Wordpress.
Here is the URL scheme for the product page:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule ^catalog/([0-9a-zA-Z]+)/[-0-9a-zA-Z]+/?$ /wp-content/themes/dbug/product\.php?asin=$1 [L]
</IfModule>
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
I am going to make a few assumptions here. The first is that the product we want to pull is a product from the Amazon.com catalog. Each of these products will have an ASIN identifier, which is what is used to retrieve the product data. Whether that is through a call to Amazon Web Services, or a call to the Wordpress database, is up to you. The important thing to remember is that the format of the URL should also be SEO friendly. The following URL will match the pattern provided above:
http://www.yourdomain.com/catalog/0321344758/dont-make-me-think-a-common-sense-approach-to-web-usability/
Remember how I mentioned groups and backreferences earlier in the post? The rule above has such a backreference. The dollar sign and number one as used in the second portion of the rule is a backreference to the group of numbers and letters inside the parenthesis in the first portion of the rule. This is our ASIN, which the application needs.
Since the theme for my personal blog is called dbug, I placed a product.php page in that theme directory. The ASIN is passed along to the page through the asin query string variable so that I can use it later for processing. Now all I need to do is code the product page.
<?php include_once("../../../wp-blog-header.php"); ?>
<?
echo $_GET["asin"];
?>
That was simple! When you include the Wordpress file wp-blog-header.php, you now have access to all of the standard Wordpress functions, including the database connection. You could build out a page that includes your common header, footer and navigation, and pull an Amazon.com product into it using the ASIN. Of course, there is a lot more to be done, but this should get you started.
Advanced Front Controller: MVC
So now we move on to the masterpiece, the front controller in MVC. I am not going to detail a full solution, since that would require an entirely new tutorial, but I am going to identify some key points using Magento Commerce as the sample application. Magento is built on top of Zend Framework, and many of the concepts will translate to that framework, as well as others.
For product pages Magento uses a database field to map a URL alias to a product. This makes sense, since marketers need the ultimate control over keywords that appear in SEO friendly URLs. However, Magento is also MVC designed, so the application needs to dissect the URL for this purpose as well. A check is performed first for the existence of a control that maps to the URL, and then for a page in the database that maps to the URL. If neither is found, a 404 page is returned. Just like outlined previously, this all begins with the .htaccess, which forwards the request to index.php.
Here is what Magento expects from a URL:
http://www.somedomain.com/module/controller/action/param/value/param/value
A sample URL in Magento that the application maps to the pattern above:
http://www.somedomain.com/review/product/view/id/1
The controller that handles the processing for the URL, and actually renders the page, can be found under the following directory:
app/code/core/Mage/Review/controllers/ProductController.php
Note that the front controller is still your index.php file. In fact, since Magento is also object-oriented, ProductController is actually a controller PHP class that extends yet another controller — both of which are not considered the front controller.
Inside ProductController.php is a viewAction() method that helps to render the page view, and an initialization method that uses the product ID query string parameter and value to grab the appropriate product review. You can see how those relate to the actual URL string, and this design paradigm is extremely similar in front controller MVC frameworks. In order to get a better grasp of how this works, I invite you to download Zend Framework or CodeIgniter, and read the documentation. The CodeIgniter User Guide is very straightforward, and should give you a greater understanding of these concepts.
I strongly suggest you use a front controller in your applications, even if you do not use a framework. This architecture provides a high level of flexibility, and if you eventually do move to a framework, the transition over will be that much easier.
Thanks for the post Brian.
I understand you may have used the .htaccess file in the post as the easiest way to implement rewrite rules, but in order to avoid performance penalties you’re better off using the *Directory* directive since it’s only loaded when Apache starts.
See the section titled “When (not) to use .htaccess files” of the Apache tutorial here.
Claude,
This is a great piece of information. Thanks so much for passing it along. I assume many open-source applications rely on the .htaccess because individuals with shared hosting plans do not have access to the configuration file. However, for personal applications this is a really handy tidbit.
[...] SEO Friendly URLs with PHP, mod-rewrite and MVC (Brian Reindel) [...]
This is a great insight to a problem which I’ve spent a long time researching. As the post mentions ‘SEO Friendly’ URLs, my requirement was slightly different being in the form of a front controller which provides completely different mapping for every page.
I basically had an old .asp site, and the pages are very well established over 5-6 years in the search engines. Instead of doing redirects en-masse to the standard MVC ‘/controller/action’ style url, I setup a controller which accessed a DB (url/pagetype) and it lets me map all the old URLs to whatever php page(s) I choose.
I also had the requirement to setup a number of aliases/customised redirects which would send duplicate pages to the main source via a redirect, thus removing any problems of duplication in Google’s eyes. This is done in a second ‘redirects’ table, and also added some features in there such as pattern/query string redirects – to stop query string duplication and other problems which wouldn’t occur on a new site.
It’s been quite cumbersome, but it’s finally getting to a stage where I’d look at releasing to help other url-paraniod people for migrating sites.
@Nick
Thanks for sharing that information. I think that helps other developers through similar situations. I find that it is not an uncommon scenario, and one I come across often during a site replatform.
Hi!
Great article, thank you!
About Magento, could you please explain why this rewrite rule doesnt work and shows me the 404 page ?
RewriteRule ^topuri/top_proiectoare.html$ /catalog/product/view/id/78 [L]
SEO-friendly URLs are a bit of a myth. Search engines don’t care – they should be called human-friendly URLs: http://www.malcolmcoles.co.uk/blog/seo-friendly-urls-myth-and-fact/.