301 Redirects in Sitecore using Xenu and Lucene

The Problem

Most of the Sitecore implementations I’ve built over the years have been for clients that already had an existing website that they wanted to convert to a more robust solution – in my case, Sitecore.  When converting these sites to Sitecore almost everytime some, if not, all URL’s on the new Sitecore are different than on the previous site.  There a number of reasons why URL’s may change:

  • Sitecore URL’s normally end in ASPX; the previous site may have had a different extension
  • the Information Architecture for the site may have changed

Whatever the reason, its important to ensure there is a way for users who attempt to find old URL’s to get to the information they are looking for.  Also, search engines will need to know where your content has been moved to.  Whats the solution that addresses both these concerns?  301 redirects!

The Solution

There are many ways to implement 301 redirects for your application.  This post covers how to create 301 Redirect items in Sitecore that represent each of the old URL’s on the previous version of the site.  I will discuss:

  • how to import these old URL’s creating an item for each and automatically storing the relevant part of the old URL
  • how the 301 Redirect item maps to a new item in Sitecore
  • how to handle performance issues that may arise

Xenu

Let’s begin by discussing how to import the old URL’s.  I’ve found that a good tool for the job is Xenu.  Xenu is a free application that allows you to, essentially, crawl websites and gather information such as URL’s, sitemap, etc…  Run Xenu against the old site and when it finishes running, go to File > Export to TAB separated file…  This will generate a TAB delimited TXT file that you can open in Excel.  Here are few helpful tips when using Xenu.

Note: Xenu will include all URL’s it can find on your site including URL’s to CSS files, JS files and images.  You’ll want exclude these results from the overall list that you will import since you really don’t need to setup 301 redirects for these types of files.

Importing into Sitecore

So after you have your list of URL’s cleaned up, you’ll need to import these into Sitecore.  The template that you will import your items into will be very simple.  I recommend a template called 301 Redirect with two fields: Old URL (single-line text field) and Redirect Link (general link field).  When performing the import, only import the part of the old URL starting after the .com (or .net, .org, whatever your domain name is).  This will reduce the amount of data you import and will make searching with Lucene much easier.

Note: You’ll be importing and creating a Sitecore Item for each old URL.  The item name doesn’t really matter since you won’t be searching against the item name, but rather, against the Old URL field so make sure you use the ItemUtil.ProposeValidItemName method when creating the 301 Redirect items.  Also, depending on the number of 301 Redirect items you may need to generate, you’ll want to make sure you place these in folders for every 100 items.

Lucene Index and Searching

The next step is to create an index specifically for these 301 Redirect items so that you can search against.  When searching, you will perform a field value search.  I recommend using the Advanced Database Crawler project written by Alex Shyba.  This code library will make searching Sitecore using Lucene much, much easier!

How and when to perform the 301 Redirect

When a user requests an old URL for which you do not have a Sitecore item under your Home item to load, this is when you’d want to have the 301 Redirect item.  What you’ll do is add a Processor to the httpRequestBegin pipeline right after the ItemResolver Processor:

<processor type="Sitecore.Pipelines.HttpRequest.ItemResolver, Sitecore.Kernel" />

Putting your class here will ensure that Sitecore has had a chance to try and resolve the Item that has been requested by the user.  If Sitecore can’t find the Item then you can perform a Lucene field value search against the 301 Redirects index you created for the URL the user requested.  If your search returns a result, you can then check for the Redirect Link and attempt to redirect the user to the new URL for the old URL.

Here is a link to download the C# class I wrote, Processor.  This will get you started with the pipeline processor you’ll need.

Also, here is a link to download a Sitecore package that will install two templates:

  1. 301 Redirects Folder
  2. 301 Redirect

Download the package here, 301 Redirect Templates-1.0.

If you have any questions or suggestions, please post them in the comments section; I’d love to hear your questions and feedback on this approach.

Leave a Reply