Broken links are a real problem -- they are one of the most annoying parts of the web, and they waste a lot time. To give you one real world example, take a look at Question 46 in the HSW question archive. This is a question on a rather delicate topic, but it is one of the most frequently visited questions in the archive.

When that question was originally published, it had 8 links associated with it at the bottom of the question. They were all interesting, relevant links. Within 5 months, all 8 of the links in Question 46 were broken. Because all of the links were broken, it generated a lot of angry email. New links were put in. They all broke, and so on. Most recently, about 2 months ago, I went in and found 5 new relevant links and put them on the page. If you look at the page today, you will find that the first link is now broken. The middle 3 are OK. The last one has been moved, but the site does not tell you how to find it, so it too is broken. In just two months, 40% of the links on this page are broken.

Stuff.dewsoftoverseas.com contains many thousands of links. Last week we ran a piece of code that checked for broken links. There were almost a thousand broken links on Stuff.dewsoftoverseas.com. We are now installing software that will help us detect and fix broken links on a daily basis. Several links break every day. In one extreme case, I ran a link as the Link of the Day, and the next day that link was broken!

Why does this happen? Here are the three most common reasons:

  • Web Site rearchitecture -- this is by far the most common reason. For example, last year NASA decided to consolidate and move many of their sites. When they did, they reorganized their directory structure and changed just about every URL across NASA. As a result, every link that Stuff.dewsoftoverseas.com had to any page on NASA broke. Similarly, on Question 46 today, apparently healthanswers.com decided to rename all of their files recently. As a result, any site or search engine anywhere on the web that linked to healthanswers.com contains broken links. Since healthanswers.com doesn't forward the links, it loses all of that traffic as a result.
  • People change ISPs. If you build a little personal web site on AOL, and then change to another ISP and move your site in the process, anyone who linked to you gets a broken link.
  • Companies and universities have dynamic populations. When a student leaves a university, he/she loses the free Internet account on the university machines. All links pointing to that site break.
The bottom line is that anytime that someone moves a site or changes the names of the files in a site in any way, everyone linking to the site gains a broken link. I witnessed this at a company several years ago. They changed many of the file names in the site during a site redesign. When they brought up the new site, their traffic was almost exactly half what it had been before the redesign. HALF of their traffic was coming in from links elsewhere on the Internet! Which makes sense -- the web is all about linking. They quickly built the ability to redirect all of the old file names to the new file names, and the next day the traffic was back to normal.

If you have a site, you want to do your best to never change the file names or the directory structure (or if you do, provide automatic redirecting from old names to new). The easiest way to do that is to register a unique URL for it and leave the file names the same for eternity. If you don't, you lose a lot of traffic!

Here is an interesting link that describes a partial solution to the problem in XML: