Virtually every website has crawl errors, and that's not an exaggeration.
Unfortunately, this doesn't mean that you can feel free to ignore them.
In fact, 404s and other crawl errors can be disastrous for your site under some circumstances.
But before you rush out and immediately redirect all of your 404s to the home page, it's important to understand that this isn't always the right move either, and sometimes it's actually the wrong move.
So, let's talk about those pesky crawling errors, what they mean for your site, and what to do about them.
For Starters, Google Would Rather See a Proper 404 than a "Soft" 404
404s are a perfectly normal part of the web...In fact, we actually prefer that, when you get rid of a page on your site, you make sure that it returns a proper 404 or 410 response code (rather than a "soft 404")...The fact that some URLs on your site no longer exist / return 404s does not affect how your site's other URLs...perform in our search results.
Another example [of a "soft" 404] is when a site redirects any unknown URLs to their homepage instead of returning 404s. [This] can have negative effects on our understanding and indexing of your site, so we recommend making sure your server returns the proper response codes for nonexistent content.
A 404 does not necessarily mean that your site is broken or that you are a sloppy webmaster. If the content doesn't exist anymore, Google is explicitly telling you that they do not want you to redirect it to the homepage. Moreover, redirecting all 404 pages to the home page is bad for the user experience. If a user clicks on a link to a piece of content that doesn't exist, they would much rather be told it doesn't exist than spend the next hour or so digging through your site looking for the content, only to come up empty handed and very frustrated.
Not every 404 or crawl error needs to be "fixed," so don't approach these with that mindset.
That said, 404's can be detrimental under the wrong circumstances, so let's talk about when.
When a Crawl Error Can't Be Ignored
There are a few important circumstances where crawl errors can't be ignored. Here are a few examples:
- The page actually does exist (or it's supposed to)
- Users encounter 404s when they use your site, i.e., you have internal links that point to 404 pages.
- The missing page has links pointing to it or traffic visiting from external pages, which could be frustrating for users, and could mean that you are missing out on SEO value.
Let's tackle these one by one.
The Page Actually Does Exist
If the page actually does exist, the last thing you want is for Google to see it as a 404 page, since it will drop out of their index if it hasn't already. If an existing page is returning a 404, it generally means that the URL is incorrect. This means that the link is pointing to the wrong place.
In this case, you want to use a 301 redirect so that users will land at the proper location, and so that the proper page gets credit in the search results.
If Internal Links Point to 404s
An internal link to a 404 page is irritating for users, and if there are a lot of them, Google may even view this as a sign of a poorly maintained site.
You can use Screaming Frog to crawl your site and discover any internal links to 404 pages, or other pages with error codes. It will also tell you which pages are linking to the missing page.
If an internal page links to a 404, there are a few different reasons this might happen:
- The page it links to is gone
- The page it links to was moved and never redirected (or if it was directed, it will return a 301 or 302)
- The link contains a typo
If the page is actually gone, all internal links to the page should simply be removed.
If the page was moved, the old page should 301 (ideally, as opposed to 302) to the new page. However, this isn't enough in an ideal situation. While it doesn't make much difference from the user perspective, the fact is that 301s are treated as links by Google. PageRank dilutes every time it passes through a link, so you get more SEO value if all internal links point to the new page, not the old one, even with the redirect in place.
Finally, if I may play captain obvious, if the reason for the 404 is a typo in an internal link, the typo should be fixed.
If External Pages Link to a 404
So, what do you do if other sites are linking to a 404 link? Do you let the link go to waste?
In the case where the 404 page really doesn't exist, a redirect is unnecessary, for the reasons discussed above. However, if the missing page has a lot of SEO value, it would be wise to restore it, or place a similar, updated page on the same URL that serves a similar purpose.
Oftentimes what happens is somebody else will link to your site, but the link will contain a typo. Under these circumstances, you should set up a 301 redirect so that the misspelled URL points to the correct URL, especially if the link is valuable.
In other circumstances, your site may get scraped and linked to with additional code tacked on. If you have a lot of links like this, you can 301 them using this method, although these links are often very low quality to begin with.
As above, if the 404 page has simply been moved, a 301 redirect to the new page should be implemented instead.
So, to wrap things up, you should only change a 404 to a 301 is the content has actually been moved. If the content is gone but the SEO value is high, you should put a new page in its place. Never link to a 404 page, and ideally, always link to the new location if a page has been moved, rather than to the 301'd page.
Image credit: Raphael Goetter