A tip for Webmasters: Redirect your 404 pages for maximum SEO impact!

Monday, August 31, 2009

I migrate hundreds of websites every year and one item that I've had in my standard list of tools is so often overlooked that I feel it needs to be revisited. I can't begin to tell you how many clients I run into every day that have moved their websites from one host to another or one technology to another to find themselves with a long list of "page not found" errors on Google searches. These same clients usually also are lamenting the fact that their "new" website, their pride-and-joy, actually does not perform as well in search rankings as the old website did!

This article addresses why this happens and what the savvy webmaster can do to save the day and ensure no click goes unhandled!

Why do website migrations or redesigns cause "page not found" errors?
When clients decide they want to redesign their website (or change host) they often forget that in doing so, many of the "old" pages will be replaced with "new" pages often times with different web addresses (URLs.) Even if the same web content platform is being retained, it is not uncommon for a redesign to change the site map of a website, sometimes pretty dramatically. This problem can be compounded by a change in web content delivery platform (like changing from PHP to ASP.NET) because the extensions of all the web pages will most likely change from "*.php" or "*.html" to "*.aspx" and others. The result is that often times when these new websites go live, many links inbound from both Google and other websites will be broken. Over time, these broken links will be dropped from most of the indexes, but the impact on your search ranking can be quite large and it can last for a long time.

Here's what some actual web addresses might be before a redesign:
http://myCoolSite.com/prod.php
http://myCoolSite.com/aboutwidgetabc.htm

After a redesign, the above web addresses and others might be "enhanced", which is a common SEO technique:
http://myCoolSite.com/Product-Information-CoolWidget.aspx
http://myCoolSite.com/About-Widget-ABC.aspx

Notice the difference in URLs. So, any links to the "old" pages will be broken after the new site goes live. Also, some pages on the old website might be dropped altogether, also causing errors.

How can Page-not-Found errors be handled?
Different server platforms offer different alternatives. There are two areas where these solutions will fall. First, there are HTTP level solutions which tend to be costly and difficult to implement and manage. Second, and the focus of this article, there are scripting solutions that are easily implemented by a webmaster.

On the HTTP level solution, the technique involves detecting the web page requests by the web server software and if necessary, handing those over to a processing engine of some kind. This technique works well, but also requires "low level" web server programming, a skill not usually associated with webmasters, especially those tasked with managing many servers and keeping them all running efficiently! The HTTP level redirect also might be costly depending on your resource structure, especially if you have to involve an expensive IT or programming resource.

However, the scripting solution is easier to accomplish because all web servers have a certain configurable parameter that lets you set what page should display the "page not found" error. Traditionally, the page-not-found is customized with the website's logos and colors, but it can often times do more by injecting a little bit of script. The technique is simple. The script will look at the address of the page requested, and if this address is on a list of "target pages", the script redirects to a more appropriate page!

Handling Page-Not-Found errors with asp.net script, an example for Windows Server & IIS

On Windows server, inside the properties for any website, we are able to configure the "Custom Errors" tab to display custom messages for different error conditions the server might encounter. Of course, a "page not found" is also a standard "404" error in web-server speak! So, on this screen, if you scroll down to the "404" error, notice that it can be "edited" to point to a custom page instead of the web server's default page.






Once on the edit window for the 404 error, notice it's possible to change the "Message Type" to "URL", meaning the web server will simply call a web address on your given website whenever a 404 condition is encountered. For this example, I am setting this value to "/404catch.aspx" which will be a script, on my website's root, that will handle the 404 errors.




Now, the web server will call your http://YourSite.com/404catch.aspx whenever a user requests an invalid page! Even better, the web server will usually pass you a query string specifying the user's original intended request. This is true of most web servers, although the syntax will be slightly different for different servers.

For IIS, the page is called like this:
http://YourSite.com/404catch.aspx?404;http://somesite.com:80/somepath/somepage.htm?someQuery=SomeValue

Note, the query string in red!

The first part of the query string is the "404" error number. In our example it will always be "404" since we only trapped the "404" error - but if you use your script for more than one error, it would be possible to differentiate which error triggered your script by inspecting this part of the query string.

The second part tells you which web address (URL) the user was looking for when the "page not found" triggered. Notice that it looks pretty much like a standard URL, except that after the first part (usually called the "host") you are also passed a colon and the port number the website is responding under. The port number is usually of no consequence and I ignore it in my script!

However, especially with the second part of the query string, you can now do some string comparison to determine if this URL is one where you could redirect to another page or display a custom message!

You can see a complete listing for my version of 404catch.aspx here (also see update on 9/22/09 below.) My version receives the page request, parses out the Query String to determine the user's original intended page, then uses a database to decide if this "original" or "receiver" should be redirected to another page (the "target".)

Some items to note as you look at my script:
  1. I used a Regex to split the query string into several pieces of interest:

    • sErrorCode - will contain the number of the error that triggered the script
    • sProtocol - will contain "http://" or "https://" depending on how the page was called
    • sHost - will contain the "host" part of the URL, typically "somesite.com" (in the example above) sans any "path" that might follow.
    • sPath - will contain the "path" part of the URL, typically "/somepath/somepage.htm" in the example above.
    • sQueryString - will contain any query string the user might have passed, typically "?someQuery=SomeValue" in the example above.


  2. After the Regex, we look for the URL (protocol + host + path) in a redirect table for my website and if I find it, I redirect to a designated page. Notice this script can support several types of redirects:

    • 301 (permanent redirect)
    • 302 (temporary redirect)


  3. Notice also this script can return a custom 404 or an inline 404.
  4. Notice the redirect is done using by changing the "response.status" property and "response.AddHeader" method to send the special HTTP code to the requesting browser to "redirect" to a new page!
Of course, in a modified version of 404catch.aspx, it is possible to use eliminate all the database code in favor of a text list or array that contains the receiver and target URLs and how they should be treated. The point of the example is not the database code, but the fact that one can catch the 404, call a custom script and that custom script can then figure out what page the user was trying to access - and decide if another page should be served instead OR a custom error displayed!

By the way, if you are interested in our "table" structure, you can download the DDL to create a table that will work with the 404catch.aspx script. If you choose to use a table, in your website's web.config, be sure to include the table name and the connection string to your database table with the following two lines:

<add key="RedirectManagerConn" value="Server=xxx.xxx.xxx.xxx;Database=YourDatabaseName;uid=YourUserName;pwd=YourPassword;Application Name=404Catch_RedirectManager"></add>
<add key="RedirectManagerTable" value="EZNP_RedirectManager_Redirects"></add>


So, with this information, you now have a powerful tool that will let you handle website migrations - without loosing a single click and perhaps retaining a lot of your search engine ranking! Good use of 301/302 redirects is a very powerful tool in the effective seo-aware webmaster.

Update 9/22/09 - support for URL Masking

Weeks after I created this post, I had a need to "mask" certain URLs instead of performing 301 redirects. In this case, instead of the browser being redirected, we wanted the "contents" of the target page to be displayed in response to the vanity URL - without the user (or the browser) ever knowing that the page was not actually there. For instance, we might want the URL http://myCoolSite.com/prod.php to be displayed to the user BUT it might not exist and we want the contents from http://myCoolSite.com/aboutwidgetabc.htm to be returned. Masking the URLs in a 404 catch technique will do this job well and it turns out the original script already had 99% of all the code required to accomplish this!

For instance, notice the procedure "OutputRemote404" which was part of the original function. It's job in the original script was to fetch the "content" of some remote 404 page (perhaps on another website or server) and return it "inline" as the result of the current 404. This might be useful when you have more than one website in support of one project (say a BLOG and a website that are "designed" to work together.) Rather than customizing a custom 404 page in two platforms, one could customize one and simply "call" the page from the other platform. Regardless of usage, this procedure accomplished the remote 404 by performing an HTTP "GetResponse" on the server side, which is a .NET documented method to read HTTP from a web server. The results of that call to a remote server (using standard HTTP by the way) is then stored to a string and finally output as the response of the current request.

Well, this was exactly what was needed for mask to work. We wanted to fetch the "content" of some remote page and display it "inline" as the result of the current request. The only change that was needed was to change the response code to "200 OK" (the valid response code for a successful HTTP request/response) so that the browser (and any analytics or log files) would not confuse this with a 404 error!

One more change needed since we are now returning "content" from our script (instead of a redirect) is we need to set the content type of the response correctly. By default, this will be set to "text/html", which works well for returning plain old HTML pages. However, sometimes we want to return XML, or maybe even an image or PDF (since the masking is possible for "any" type of content.) So, notice the new script has a procedure called "OutputInline" used to handle the output. This procedure is identical to "OutputRemote404", except it checks to see the extension of the file is an ".xml" extension. If it is, the Content Type of the response is set to "text/xml" so that the browser displays the content correctly. Of course, in order for this script to be truly complete, we'd need to add more types of content (images, PDFs, etc.) However, for brevity and to get this out quickly, I've only added XML handling for now. Also, my first choice would not be to hard code these extension checks (although that might be the easiest.) Instead, I'd prefer something where the actual MIME mappings on the server are used - and this would handle literally ANY content type the server is configured to return - without needing specific code for each! This, however, will have to wait for another time when I can research it more thoroughly.

Download the modified 404catch.aspx here, with support for masked URLs. Of note is that this script still supports the values 301, 302 and 404 in the database field "RedirectType" plus now an additional "inline" value that performs the masking instead of a redirect. The values of the other fields are exactly the same. So, one could easily change from 301 redirects to masks by changing the value of RedirectType to "inline" for the desired records in the database.

I hope you find this helpful! Let me know comments or questions below.

A lesson for large company managers: stop being large and start spending "your own" money!

Tuesday, August 18, 2009

I've been building systems for companies large and small for over 20 years. When I say large, I don't mean Microsoft-large, but certainly Fortune-1000-large. And by system, I mean anything sort-of computer or process, like websites, databases, business applications, business processes, etc. In this time, I've seen a lot. Maybe not all there is to see, but certainly a lot indeed. However, in this article, I am going to concentrate on one small detail of my overall experience.

In dealing with many clients large and small, there seems to be one cultural difference in the mindset of the managers that I've observed which clearly differentiates the small companies from the large companies. I can't believe it took me 20 years to notice this... but the pattern is clear not just in my dealings with business but also in the news over the last few years. Are you ready for my big reveal? Drum roll please!

When a small business builds a system, the success of the system is surely tied to the success of the business. If the business does poorly, the system tends to disappear. Sometimes, the system is a big part of the problem, but most times, the business model is flawed and no matter how good a system, the business will eventually fail.

However, when a small business does well, they inevitably work on their systems, good or bad, to make them better. Their success leads their enthusiasm for better systems (if they have a vision for growth) and they work on them again and again to get even better at what they do. Consequently, small business managers have to watch every dollar they spend and they spend it wisely (if they succeed) and constantly think about return on investment. They spend because they know a clear path to recover their dollar – often knowing in intimate detail the numbers they need to “hit” in order to make their money back! They constantly look for value and they tend to innovate in doing so because they often times have to compete and act like, without the budget of, big-business!

Small businesses tend to be led by entrepreneurs of course, and some of these small companies go on to become quite large. Think Amazon and Google for instance, both of which are very large now, but very much operate with an entrepreneurial mindset even to this day! At Google, they even have an engineered innovation path when they require (or at least encourage) their employees to spend a certain percentage of their paid time working on "pet" projects! You can probably think of other companies that are successful that still operate (and spend) like small businesses at least in their entrepreneurial mindset. Of course, some of these small businesses spend quite a bit, the difference being they spend only when they have a clear picture of their return for their dollar! (Check out the book “What would Google do?” by Jeff Jarvis for more on the Google way.)

By the way, not surprisingly in small companies, every dollar spent (on systems or otherwise) is very close to an owner or a partner! Every dollar comes from his or her pocket and everyone close to them knows it. The outcome of this closely-held-money factor is that money is spent carefully and judiciously most times. Everyone at every level (since there tend to be so few "levels") is close to the money and, in effect, feels that it is their pocket too!

Yet, large businesses are often times quite successful before they start thinking of systems. Some of these companies are behemoths in both size and age! I'm not talking 100 employees large, I am talking 1,000+ employees large! Some large businesses have not had entrepreneurs in generations, or if they still do, the entrepreneurs and their spirit are muffled by levels upon levels of bureaucracy and red tape. Managers routinely sign stacks of invoices for tens of thousands of dollars “automatically”. They send out RFPs to request systems they know will cost more than their own homes, many times with no understanding of the clear return on their dollars! Worse even, they actually approve proposals that they very well know (in the back of their minds) will actually cost 4 times as much and take twice as long to implement. And they do so calmly and knowing that when the projects fail (and the costs overrun, and the opportunities are lost for months at a time,) they will have zero accountability. There is usually a vendor (either old or new) or another team inside the company, or another manager they can blame! “It’s not my fault that the vendor did not know what they were doing!” They will rationalize to themselves and their peers later. It is almost like the fact they have millions of dollars in the bank (and each manager has levels upon levels of separation away from the bank) makes them just plain dumb when it comes to spending that money! It is like spending monopoly money to them: the money is not real, the money is not theirs – woo hoo, spend it like a drunken sailor!

Well, I have news for those managers. When a large project fails in a large business, it is your fault 100%. You released the RFP (whether you wrote it or not, understood it or not.) You approved the proposal even though you knew (or at least suspected) it was not possible. You failed to implement safeguards and alternatives and to protect your company from risk. You failed to engineer, then prototype and pilot, a clear path to return of every dollar (with a gain - somewhere!) You spent (not) your money and you did not care one bit about it because you rationalized it as “someone else’s problem to understand the details.” Plain and simple, you did not manage. Instead, you were managed by your own big head (and collectively the exuberance of your teams!) Welcome to corporate America. You’ve passed the test!

What would happen in my world? I’d go out of business. In a small business if you spend stupidly, you die stupidly!

I’m not saying large companies don’t need to spend a lot of money on systems (or otherwise). Large companies should indeed spend big money on their business but only when they have a clear justification and an even clearer path to return every single dollar plus more – somehow direct or indirect! Every manager that spends a dollar should be held to the scrutiny of that expenditure. Every project should have a clear assessment of need, risk and gain. Every project should have a clear time line that means something (not an ever shifting calendar) and milestones that are measurable and measured (not ignored.) Every project should be prototyped, piloted and proven in small scale every single time (yes, it can be done no matter the project size – especially the larger the project and the larger the expense.)

Many books have been written on successful project management and budgeting. I don’t claim to be an expert in that area though I can hold my own. I certainly won’t go into it here as you can pick up a book (or two, or ten) and go to town. There is certainly not a short supply of project management methods to follow, though there definitely seems to be a short supply of people practicing the techniques!

However, here is perhaps an idea that might help. This is “my” corporate bailout plan; my economic recovery plan if you would.

Put every manager (and their entire teams) real close to the bank! If a manager wants to spend a dollar for the company, make that dollar mean something to their bottom line! In some direct correlation to the revenues of the company, reduce that manager’s salary (and his team’s salary) by the same percentage. Then, as they recover each dollar, give them back their salary – at the same rate they earn back the company’s money! If they actually make recover their money and more, give them a share of that surplus. Make them care about the success with the most primitive incentive in the world: their pockets – their fuller pockets, that is! If managers are willing to risk the company's pocket, they should be willing to risk their own!

My guess is that overnight, you’ll start to see a sudden interest in project success at every level of the company! Suddenly everyone will care tremendously about each and every dollar spent. When the decision making is right on each manager’s pocket (and their teams) you will start to see an unheard of zeal in protecting that money and a huge push to innovation and to making more with less!

You think I am crazy don’t you? Perhaps I am! You say: how could I reduce a manager’s salary (and their team’s salary) by 1% or 3% or 4% or 5%? Easy… start today and start now and you will see how where managers wanted to spend $100K on a project, suddenly they’ll have $10K solutions! Put the money in their pocket and they’ll spend it as if it was their own! Over the long run, you might not spend less (ideally you still spend big bucks!) However, your returns will be much larger than before, with waste crushed through hyper-participative-project-management!

Hey, if there's ever a time when you'll be able to hire people with creative compensation plans, now is definitely that time!

I have a business partner that has told me many times: "you cannot save idiots from themselves!" Perhaps this should be the latest business fad to make it across corporate America.