Let's talk about query strings, how they can cause problems for SEO, how to find them, and how to fix the SEO issues.
What Query Strings Are, And Why They're A Problem For SEO
Query strings, also called URL parameters or URL variables, are the parts of a web address that comes after a question mark. For example, in...
...the color=blue portion of the URL is an example of a URL parameter (otherwise known as a query string or GET variable). URL parameters are used to send information to the server and are frequently used to filter or sort the content of a page, or to tag visits from a specific source, e.g. adwords campaign tags.
Unfortunately, query strings can be problematic for SEO in one of two ways: by creating duplicate content or by causing keyword cannibalization.
Query strings create duplicate content when the addition of a query string to a URL makes no change or no significant change to the content of the page. A re-sorted or filtered version of a page is hardly different from the original, and a version of a page with URL tags for tracking is identical to the original. Duplicates like these aren't likely to pick up an all-out penalty, but they could hurt your site's quality score which will diminish ranking potential.
Query strings can also create a large number of pages that target the same keyword phrase. This is a big problem if a page dedicated to a specific family of search phrases is competing with a page that ranks for them just because the content is sorted or filtered by a related phrase.
How To Find Pages With Query Strings
1. Get Screaming Frog
Go here and click "Download." Wait for it to download, click the file, and follow the installation instructions.
2. Crawl Your Site
After launching Screaming Frog, enter your domain name and hit "Start" to crawl the pages on your site:
Now wait for the status bar to climb to 100%:
3. Find the Pages With Query Strings
To display only the pages with query strings, click the "URI" tab, enter "\?" into the search bar, and hit the "enter" key on your keyboard:
Identify Problematic Query Strings
Your server should resolve pages with query strings, so their mere existence is not the issue. The problem is that if you discover them with a crawl as above, it means that you are linking directly to query string URLs within your site architecture. This means that Google is likely to consider them URLs that represent a unique destination, rather than a sorted, filtered, or tagged version of the parent page.
We will discuss methods of telling the search engines to ignore query strings in a later section. Before doing that, we need to make sure we aren't throwing out the baby with the bathwater.
Start by sorting the query string URLs by "Status Code" in Screaming Frog, then copy the URLs.
Paste the URLs into a spreadsheet and categorize them as follows:
Categorize them into "Duplicate," "Alternative," and "Sorted/Filtered."
Duplicate: The content is completely identical to the parent URL without the query string.
Sorted/Filtered: The content is altered in some way by back-end server activity, such as by sorting or filtering, but this makes little difference for the user, who can just sort, filter, or otherwise make changes to the content on the page through the interface.
Alternative: The query string version of the URL contains entirely different content from the parent URL.
Feel free to categorize these in chunks by the parent URL or, in some cases, where appropriate, by the parent folder.
Resolve Problematic Query Strings
Fixing "Alternative" Query Strings
These pages should be moved so that they are no longer query strings. Search engines may otherwise confuse them for duplicates of the parent page.
- Set up a new URL and place the content there.
- Update all URLs on the site that link to the query string so that they link to the new URL. You can find the pages that link to the query string in Screaming Frog by clicking on the query string URL and then clicking the "Inlinks" tab in the bottom section of the window.
- Set up a 301 redirect from the query string to the new URL.
Fixing "Duplicate" Query Strings
You should avoid linking to duplicates of the parent URL if at all possible, at least within your internal links. If this is being done for tracking purposes, consider using event tracking or other alternatives. As above, you can find the pages that link to the query string by looking at the "Inlinks" tab.
Where you can't update the URL, you should tell the search engines to ignore the query strings as discussed in the next section.
Fixing "Sorted/Filtered" Query Strings
If this is not possible, the interface should be designed such that there are no hyperlinks to the query string versions of the pages, and the search engines should be told to ignore the query strings as discussed in the next section. As above, you can use the "Inlinks" tab in Screaming Frog to identify pages that link to query string URLs.
Instructing Search Engines To Ignore Query Strings
Setting Up Canonicalization
Canonicalization tells the search engines which version of a page is official. For example, if http://www.example.com/folder/product?color=blue is a duplicate or sorted/filtered version of http://www.example.com/folder/product, then http://www.example.com/folder/product?color=blue should feature a canonical tag as follows (placed within the <head>):
<link rel="canonical" href="http://www.example.com/folder/product?color=blue">
You should actually set this up for every page, so that any unforeseen query strings are accounted for (such as tagged URLs for campaigns). If you are on WordPress, you can use Yoast's SEO plugin to take care of this, as well as various other alternatives.
DO NOT set this up until you have resolved your "Alternative" query strings at a minimum. Alternatives are not duplicates, and so should not canonicalize to a different URL. Furthermore, this is not a substitute for fixing your sorted, filtered, and duplicate query strings. Links to these query strings still throw away PageRank that should be going to the parent URL.