mod_rewrite, a beginner's guide (with examples)

source
  

Until recently, I only had the vaguest of ideas of what mod_rewrite was, and I certainly
   had no clue about how to use it.  So, when I started designing this site, I decided to
   delve into the wonders that are the mod_rewrite Apache module.


So, what is mod_rewrite for?



   Simply, mod_rewrite is used for rewriting a URL at the server level, giving the user
   output for that final page.  So, for example, a user may ask for
   http://www.somesite.com/widgets/blue/, but will really be given
   http://www.somesite.com/widgets.php?colour=blue by the server.  Of course, the
   user will be none the wiser to this little bit of chicanery.



   On workingwith.me.uk, I use mod_rewrite to
   redirect all pages to one central PHP page, which then loads the data that the user wanted
   from an external data file.  Lots of people use mod_rewrite to show an "alternative" image
   when people are hotlinking directly to their images.


What do I need to get mod_rewrite working?



   There's pretty much only one thing you'll need to get mod_rewrite working for you,
   and that's to have the mod_rewrite module installed on your Apache server!  


   For the
   purpose of this article, I'm going to assume that you don't have access to view or edit
   the Apache server httpd.conf file, so the easiest way to check whether the mod_rewrite
   module is installed will be to look on your phpinfo page.  If you've not already created
   one of these for yourself, just copy and paste the following code into an new text file
   using your favourite text editor, save it as phpinfo.php, and upload it to
   your server:


<?php phpinfo(); ?>




   Load that page up in your web browser, and perform a search for "mod_rewrite".  All being well,
   you'll find it in the "Apache loaded modules" section of the page.  If it isn't there,
   you'll have to contact your hosting company and politely ask them to add it to the Apache
   configuration.



   Assuming the mod_rewrite module is loaded, then you're good to go!


A simple mod_rewrite example



   So, let's write a simple mod_rewrite example.  This isn't going to be anything fancy; we're
   just going to redirect people who ask for alice.html to the page
   bob.html instead.  First, let's create the Alice and Bob pages.  Below is
   Alice's webpage - create a similar one for Bob.


<html>
   <head>
      <title>Alice's webpage</title>
   </head>
   <body>
      <p>
         This is Alice's webpage
      </p>
   </body>
</html>




  Upload both of these to your web server, and check that you can view both of them.  Now
  comes the fun - we're going to add a couple of lines to your .htaccess file.
  The .htaccess file is a text file which contains Apache directives.  Any directives which
  you place in it will apply to the directory which the .htaccess file sits in, and any
  below it.  To ours, we're going to add the following:


RewriteEngine on
RewriteRule ^alice.html$ bob.html




   Upload this .htaccess file to the same directory as alice.html and bob.html, and reload
   Alice's page.  You should see Bob's page being displayed, but Alice's URL.  If you still see
   Alice's page being displayed, then check you've followed the instructions correctly (you
   may have to clear your cache).  If things still aren't working for you, then contact your
   technical support people and ask them to enable mod_rewrite and the FileInfo override in
   their httpd.conf file for you


The structure of a RewriteRule


RewriteRule Pattern Substitution [OptionalFlags]




   The general structure of a RewriteRule is fairly simple if you already understand
   regular expressions.  This article isn't intended to be a tutorial about regular
   expressions though - there are already plenty of those available.  RewriteRules are
   broken up as follows:



    
RewriteRule

    

           


             This is just the name of the command.
           

        

    
Pattern

    

           


             A regular expression which will be applied to the "current" URL.  If any RewriteRules
             have already been performed on the requested URL, then that changed URL will be the
             current URL.
           

        

    
Substitution

    

           


             Substitution occurs in the same way as it does in Perl, PHP, etc.
           

           


              You can include backreferences and server variable names (%{VARNAME})
              in the substitution.  Backreferences to this RewriteRule should be written as
              $N, whereas backreferences to the previous RewriteCond should be written
              as %N.
           

           
           


              A special substitution is -.  This substitution tells Apache to not
              perform any substitution.  I personally find that this is useful when using the
              F or G flags (see below), but there are other uses as well.
           

        

    
OptionalFlags

    

           


             This is the only part of the RewriteRule which isn't mandatory.  Any flags which you
             use should be surrounded in square brackets, and comma separated.  The flags which
             I find to be most useful are:
           

           
           


        *           
                       


                          F -
                          Forbidden.  The user will receive a 403 error.
                       

                    

        *           
                       


                          L -
                          Last Rule.  No more rules will be proccessed if this one was successful.
                       

                    

        *           
                       


                          R[=code] -
                          Redirect.  The user's web browser will be visibly redirected to the substituted
                          URL.  If you use this flag, you must prefix the substitution with
                          http://www.somesite.com/, thus making it into a true URL.  If no
                          code is given, then a HTTP reponse of 302 (temporarily moved) is sent.
                       

                    

                 


           
           


              A full list of flags is given in the Apache mod_rewrite manual.
           

        



A slightly more complicated mod_rewrite example



  Let's try a slightly more meaty example now.  Suppose you have a web page which takes
  a parameter.  This parameter tells the page how to be displayed, and what content to pull
  into it.  Humans don't tend to like remembering the additional syntax of query strings for
  URLs, and neither do search engines.  Both sets of people seem to much prefer a straight
  URL, with no extra bits tacked onto the end.


   In our example, you've created a main index page with takes a page parameter.
   So, a link like index.php?page=software would take you to a software page,
   while a link to index.php?page=interests would take you to an interests page.
   What we'll do with mod_rewrite is to silently redirect users from
   page/software/ to index.php?page=software etc.   



   The following is what needs to go into your .htaccess file to accomplish that:


RewriteEngine on
RewriteRule ^page/([^/\.]+)/?$ index.php?page=$1 [L]




   Let's walk through that RewriteRule, and work out exactly what's going on:


    
^page/

    

           


              Sees whether the requested page starts with page/.  If it doesn't,
              this rule will be ignored.
           

        
    
([^/\.]+)

    

           


              Here, the enclosing brackets signify that anything that is matched will be
              remembered by the RewriteRule.  Inside the brackets, it says "I'd like one or
              more characters that aren't a forward slash or a period, please".   Whatever is found
              here will be captured and remembered.
           

        
    
/?$

    

           


              Makes sure that the only thing that is found after what was just matched is a
              possible forward slash, and nothing else.  If anything else is found, then this
              RewriteRule will be ignored.
           

        

    
index.php?page=$1

    

           


              The actual page which will be loaded by Apache.  $1 is magically
              replaced with the text which was captured previously.  
           

        

    
[L]

    

           


              Tells Apache to not process any more RewriteRules if this one was successful.
           

        




   Let's write a quick page to test that this is working.  The following test script will simply
   echo the name of the page you asked for to the screen, so that you can check that the
   RewriteRule is working.

<html>
   <head>
      <title>Second mod_rewrite example</title>
   </head>
   <body>
      <p>
         The requested page was:
         <?php echo $_GET['page']; ?>
      </p>
   </body>
</html>




   Again, upload both the index.php page, and the .htaccess file to the same directory.
   Then, test it!  If you put the page in http://www.somesite.com/mime_test/,
   then try requesting http://www.somesite.com/mime_test/page/software.  The
   URL in your browser window will show the name of the page which you requested, but the
   content of the page will be created by the index.php script!  This technique
   can obviously be extended to pass multiple query strings to a page - all you're limited
   by is your imagination.


Conditional Statements and mod_rewrite



   But what happens when you start getting people hotlinking to your images (or other files)?
   Hot linking is the act of including an image, media file, etc from someone else's server
   in one of your own pages as if it were your own.  Obviously, as a webmaster, there are
   plenty of times when you don't want people doing that.  You'll almost certainly have seen examples
   where someone has linked to one image on a website, only for a completely different,
   "nasty" one to be shown instead.  So, how is this done?



   It's pretty simple really.  All it takes are a couple of RewriteCond statements in your
   .htaccess file.



   RewriteCond statements are as they sound - conditional statements for RewriteRules.
   The basic format for a RewriteCond is RewriteCond test_string cond_pattern.
   For our purpose, we will set the test_string to be the HTTP_REFERER.  If the test string
   is neither empty nor our own server, then we will serve an alternative (low bandwidth)
   image, which tells the person who is hotlinking off for stealing our bandwidth.



   Here's how we do that:


RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?somesite.com/.*$ [NC]
RewriteRule \.(gif|jpg|png)$ http://www.somesite.com/nasty.gif [R,L]




   Here, the RewriteRule will only be performed if all the preceeding RewriteConds are
   fulfilled.  In the second RewriteCond, [NC] simply means "No Case", so it
   doesn't matter whether the domain name was written in upper case, lower case or a mixture
   of the two.  So, any requests for gif, jpg or png files from referers other than somesite.com
   will result in your "nasty" image being shown instead.  


   The [R,L] in
   the RewriteRule simply means "Redirect, Last".  So, the RewriteRule will visibly redirect
   output to "nasty.gif" and no more RewriteRules will be performed on this URL.


   If you simply don't want the hot linkers to see any image at all when they hot link to your
   images, then simply change the final line to
   RewriteRule \.(gif|jpg|png)$ - [F].  The - means "don't rewrite
   the requested URL", and the [F] means "Forbidden".  So, the hot linker will
   get a "403 Forbidden message", and you don't end up wasting your bandwidth.


Conclusion



  mod_rewrite is an incredibly handy tool to have in your arsenal.  This article only
  scratched the surface of what is possible with mod_rewrite, but should have given you
  enough information to go out and start mod_rewriting history yourself!

Tags: Linux