High Body Keyword Density
Keyword Stuffing penalties arise when abusing a once extremely effective tactic: sculpting Keyword Density to a high level. Our own experiments have shown that penalties can happen as early as 6% density, though TF-IDF (covered earlier) is likely at play and this is sensitive to topics, word types, and context.
Keyword Dilution
This factor manifests itself from logic: if a higher Keyword Density or TF-IDF is positive, at some point, a total lack of frequency/density will decrease relevance. As Google has improved at understanding natural language, this may be better described as Subject Matter Dilution: writing content that wanders without any clear theme. The same basic concept is at play either way.
Keyword-Dense Title Tag
Aside from a page as a whole, Keyword Stuffing penalties appear to be possible within the title tag. An ideal title tag should definitely be less than 60-70 characters and hopefully still provide enough value to function as a good search ad in Google's results. At absolute minimum, there is no benefit in using the same keyword five times in the same tag.
Exceedingly Long Title Tag
Aside from a page as a whole, Keyword Stuffing penalties appear to be possible within the title tag. An ideal title tag should definitely be less than 60-70 characters and hopefully still provide enough value to function as a good search ad in Google's results. At absolute minimum, there is no benefit in using the same keyword five times in the same tag.
Keyword-Dense Heading Tags
Heading Tags, such as H1, H2, H3, etc. can add additional weight to certain words. Those attempting to abuse this positive ranking factor will find that they can't simply cram as many keywords as they can into these tags, even if the tags themselves grow to be no lengthier than usual. Keyword Stuffing penalties appear to be possible simply as a function of the total space within these tags.
Heading Tag (H1, H2, etc.) Overuse
As a general rule, if you want a concrete answer of whether or not an SEO penalty exists, try pushing a positive ranking factor well beyond what seems sane. One easily verified penalty involves placing your entire website in an H1 tag. Too lazy for that? Matt Cutts drops a less-than-subtle hint about too much text in an H1 in this source.
URL Keyword Repetition
While there doesn't seem to be any penalties associated with using a word in a URL multiple times, the value added from keyword repetition in a URL appears to be basically nothing. This can be verified very simply by placing a word in a URL five times instead of just once.
Source(s): Speculation
Exceedingly Long URLs
Matt Cutts notes that after about five words, the additional value behind words in a URL dwindles. It's theorized and pretty replicable that this occurs in Google as well, although directly unconfirmed. Although they operate somewhat differently, Bing has also gone out of their way to confirm URL keyword stuffing is a penalty in their engine.
Keyword-Dense ALT Tags
Given that ALT tag text is not generally directly visible on the page, ALT tag keyword stuffing has been widely abused. A few descriptive words are fine and actually ideal, but doing more than this can invite penalties.
Exceedingly Long ALT Tags
Given that ALT tag text is not generally directly visible on the page, ALT tag keyword stuffing has been widely abused. A few descriptive words are fine and actually ideal, but doing more than this can invite penalties.
Long Internal Link Anchors
At minimum, really long internal anchor text will not bring along with it any additional value - a devaluation. In extreme circumstances, it appears possible to draw Keyword Stuffing webspam penalties from exceedingly lengthy anchor text.
Source(s): Speculation
High Ratio of Links to Text
It's theorized that just having a site that's all links and no substance is the mark of a low quality site. This fits the narrative content quality and not ranking pages that look like too much like search results pages, but is not currently supported by a study as proof.
Source(s): Speculation
Too Much "List-style" Writing
Matt Cutts has suggested that any style of writing that just lists a lot of keywords could also fit the description keyword stuffing. Example: listing way too many things, words, wordings, ideas, notions, concepts, keywords, keyphrases, etc. is not a natural form of writing. Too much of this sort of thing will draw devaluations and possibly penalties.
JavaScript-Hidden Content
Although Google recommends against putting text in JavaScript as it is unreadable by search engines, that does not mean that Google does not crawl JavaScript. In extreme instances where JavaScript may be used to cloak non-JavaScript on-page text, it may still be possible to receive a cloaking penalty.
CSS-Hidden Content
One of the first and most well-documented on-page SEO penalties- intentionally hiding text or links from users, especially for the sake of loading the page up with keywords that are just for Google, can invite a nasty penalty. Some leeway appears given in legitimate circumstances like when using tabs or tooltips.
Foreground Matches Background
Another common issue that brings about cloaking penalties occurs when the foreground color matches the background color of certain content. Google may use their Page Layout algorithm for this to actually look at a page visually and prevent false positives. In our experience, this can still occur accidentally in a handful of scenarios.
Single Pixel Image Links
Once a popular webspam tactic for disguising hidden links, there's no question that Google will treat "just really small links" as hidden links. This might be done by a 1px by 1px image or just really incredibly small text. If you're attempting to fool Google using such methods, odds are certainly that they're going to catch you eventually.
Empty Link Anchors
Hidden Links, although often implemented differently than Hidden Text by means such as empty anchor text are also likely to invite cloaking penalties. This is dangerous territory and another once widespread webspam tactic, so be sure to double-check your code.
Copyright Violation
Publishing content in a manner that is in violation of the Digital Millennium Copyright Act (DMCA) or similar codes outside of the U.S. can lead to a severe penalty. Google attempts to analyze unattributed sources and unlicensed content automatically, but users can also report infringement directly, resulting in manual action.
Doorway Pages
Doorway Pages, or Gateway Pages, are masses of pages whose only value is to soak in search traffic. They do not provide value to the user. For example, creating a product page for every city name in America, each with unique keywords. This method is called "spamdexing" (short for spamming Google's index of pages).
Overuse Bold, Italic, or Other Emphasis
At minimum, if you place all the text on your site within a bold tag, for the reason that such text is often given additional weight compared to the rest of the page, you haven't cracked some code that just makes your whole site rank better. This sort of activity fits Google's frequent blanket description of "spammy activity", and we have verified such penalties in our own non-public studies for clients.
Broken Internal Links
Broken internal links make a site more difficult for search engines to index and more difficult for users to navigate. It's a tell-tale sign of a low quality website. Make sure your internal links are never broken.
Redirected Internal Links
PageRank patents and papers, and also Matt Cutts, have suggested that redirects are subject to "PageRank decay". Meaning, some authority is lost every time a page directs to another. In 2016, Google's Gary Illyes posted a rather direct tweet stating that this is no longer the case.
Text in Images
Google has come a long way at analyzing image, but on the whole, it's very unlikely that text that you present in rich media will be searchable in Google. There's no direct devaluation or penalty when you put text in an image, it just prevents your site from having any chance to rank for these words.
Text in Video
Just like with images, the words that you use in video can't be reliably accessed by Google. If you are publishing video, it's to your benefit to always publish a text transcript such that the content of your video is completely searchable. This is true regardless of rich media format, including HTML5, Flash, SilverLight, and others.
Text in Rich Media
Google has come a long way at analyzing images, videos, and other formats of media such as Flash, but on the whole, it's very unlikely that text that you present in rich media will be searchable in Google. There's no devaluation or penalty here,
Frames/Iframes
In the past, search engines were entirely unable to crawl through content located in frames. Though they've overcome this weakness to an extent, frames do still present a stumbling point for search engine spiders. Google attempts to associate framed content with a single page, but it's far from guaranteed that this will be processed correctly.
Dynamic Content
Dynamic content can create a number of challenges for search engine spiders to understand and rank. Using noindex and minimizing use of such content, especially where accessible by Google, is believed to result in a more positive overall user experience and likely to draw preferential treatment in rankings.
Thin Content
Although it's always been better to write more elaborate content that covers a topic thoroughly, the introduction of Nanveet Panda's "Panda" algorithm established a situation where content with basically nothing of unique value would be severely punished in Google. An industry-wide recognized case study on Dani Horowitz's "DaniWeb" forum profile pages serves as an excellent example of Panda's most basic effects.
Domain-Wide Thin Content
For a very long time, Google has made an effort to understand the quality and unique value presented by your content. With the introduction of the Panda algorithm, this became an issue that was scored domain-wide, rather than on a page-by-page basis. As such, it's now usually beneficial to improve the average quality of content in search engines, while using 'noindex' on pages that are doomed to be repetitive and uninteresting, such as blog "tag" pages and forum user profiles.
Too Many Ads
Pages with too many ads, especially above-the-fold, create a poor user experience and will be treated as such. Google appears to base this on an actual screenshot of the page. This is a function of the Page Layout algorithm, also briefly known as the Top Heavy Update.
Use of Pop-ups
Although Google's Matt Cutts answered no to this question in 2010, Google's John Mueller said yes in 2014. After weighing both responses and understanding the process behind the Page Layout algorithm, our tie-breaking ruling is also "yes": using pop-ups can definitely harm your search rankings.
Duplicate Content (3rd Party)
Duplicate content that appears on another site can bring about a significant devaluation even when it's not in violation of copyright guidelines and properly cites a source. This falls in line with a running theme: content that is genuinely more unique and special against a backdrop of the web as a whole will perform better.
Duplicate Content (Internal)
Similar to when content duplicated from another source, any snippet of content that is duplicated within a page or even the site as a whole will endure a decrease in value. This is an extremely common issue and can creep up from anything ranging from too many indexed tag pages to www vs. non-www versions of the sites to variables appended to URLs.
Linking to Penalized Sites
This was introduced as the "Bad Neighborhood" algorithm. To quote Matt Cutts: "Google trusts sites less when they link to spammy sites or bad neighborhoods". Simple as that. Google has suggested using the rel="nofollow" attribute if you must link to such a site. To quote Matt again: "Using nofollow disassociates you with that neighborhood."
Slow Website
Slow sites will not rank as well as fast ones. Google has your target audience in mind here, so consider geography, devices, and connection speeds of individuals. Google has repeatedly suggested "under two seconds", and says that they aim for under 500ms.
Page NoIndex
If a page contains the meta tag for "robots" that carriers a value "noindex", Google will never place it in its index. If used on a page that you want to rank, it's a bad thing. It can also be a good thing when removing pages that will never be good for Google users, and elevate the average experience on visitor arriving from Google.
Source(s): Logic
Internal NoFollow
This can appear two ways: if a page contains the "robots" meta tag with the value "nofollow", it will imply that the rel="nofollow" attribute is added to every link on the page. Or, it can be added to individual links. Either way, this is taken to mean "I don't trust this", "crawl no further", and "do not give this PageRank". Matt does not mince words here: just never "nofollow" your own site.
Disallow Robots
If your site has a file named robots.txt in the root directory with a "Disallow:" statement followed by either "*" or "Googlebot", your site will not be crawled. This will not remove your site from the index. But it will prevent any updating with fresh content, or positive ranking factors that surround age and freshness.
Poor Domain Reputation
Domain names maintain a reputation with Google over time. Even if a domain changes hands and you are now running an entirely different web site, it's possible to suffer from webspam penalties incurred by the poor behavior of previous owners.
IP Address Bad Neighborhood
While Matt Cutts has gone out of his way to debunk the long-standing practice of "SEO web hosting" on dedicated IP addresses serving any real benefit, this is contradicted by the notion that in rare cases, Google has penalized entire server IP ranges where they might be associated with a private network or bad neighborhood.
Text in JavaScript
While Google continues to improve at crawling JavaScript, there's still a fair chance that Google will have trouble crawling content that's printed using JavaScript, and further concern that Googlebot won't fully understand the context of when it gets printed and to whom. While printing text with JavaScript won't cause a penalty, it's an undue risk and therefore a negative factor.
Poor Uptime
Google can't (re)index your site if they can't reach it. Logic also would dictate that a site that's unreliable also leads to a poor Google user experience. While one outage is unlikely to be devastating to your rankings, achieving reasonable uptime is important. One or two days should be fine. More than this will cause problems.
Private Whois
While it's often pointed out that Google can't always access whois data from every registrar, Matt Cutts made it clear at PubCon 2006 that they were still looking at this data, and that private whois, when combined with other negative signals, may lead to a penalty.
False Whois
Similar to private whois data, it's been made clear that representatives from Google are aware of this common trick and treating it as a problem. If for no reason other than it being a violation of ICANN guidelines, and potentially allowing a domain hijacker to steal your domain via a dispute without you getting a say, don't use fake information to register a domain.
Penalized Registrant
If you subscribe to the notion that private and false whois records are bad, and take into account that Matt Cutts has discussed using this as a signal that identifies webspam, it stands to reason that a domain owner can be flagged and penalized across numerous sites. This is unconfirmed and purely speculative.
Source(s): Speculative
ccTLD in Global Ranking
ccTLDs are country-specific domain suffixes, such as .uk and .ca. They are the opposite of gTLDs, which are global. These are useful in executing international SEO, but can be equally problematic when attempting to rank outside of these countries. An exception to this rule is that a small number of ccTLDs have been widely used for other purposes such as .co, and have been labeled by Google as "gccTLDs".
Too Many Internal Links
Matt Cutts once stated that there was a hard limit of 100 links per page, which was later retracted to say "keep it at a reasonable number". This was because Google once would not download more than 100K of a single page. That's no longer true, but since every link divides your distribution of PageRank, this potential makes sense without any altered understanding of how Google works.
Too Many External Links
As a simple function of the PageRank algorithm, it's possible to "leak PageRank" out from your domain. Note, however, that the negative factor here is "too many" external links. Linking out to a reasonable number of external sites is a positive ranking factor that's confirmed by Mr. Cutts in the same source article to this factor.
Invalid HTML/CSS
Matt Cutts has said no to this being a factor. Despite this, our experience has consistently indicated yes. Code likely doesn't have to be perfect and this may be an indirect effect. But the negative effects of bad code are supported by logic as you consider other code-related factors (hint: there's a code filter up top). Bad code can cause countless, potentially invisible issues including tag usage, page layout, and cloaking.
Outbound Affiliate Links
Google has vocally taken action against affiliate sites that provide 'no additional value' in the past. It's in the guidelines. There's much SEO paranoia that surrounds hiding affiliate links using a 301 redirect in a directory blocked by robots.txt, although, Google can view HTTP headers without navigating. A number of affiliate marketers have reported reasonably scientific case studies of penalties from too many affiliate links, therefore, we rate this as likely.
Parked Domain
A parked domain is a domain that does not yet have a real website on it; often sitting unused at a domain registrar outside of some machine-generated advertising. Anymore, this fails to meet so much other ranking criteria that it probably wouldn't have much success in Google anyway. They once had some. But Google has repeatedly made it clear that they don't want to rank parked domains of any kind.
Search Results Page
Generally speaking, Google wants users to land on content, not other pages that look like listings of potential content, like the Search Engine Results Page (SERP) that such a user just came from. If a page looks too much like a search results page, by functioning as just an assortment of more links, it's likely to not rank as well. This may also apply to blog posts outranking tag/category pages.
Automatically Generated Content
Machine-generated content that's based upon user search query will "absolutely be penalized" by Google and is considered a violation of the Google Webmaster Guidelines. There are a number of methods that could qualify which are detailed in the Guidelines. Once exception to this rule appears to be machine-generated meta tags.
Infected Site
Many website owners would be surprised to know that most compromised web servers are not defaced. Often, the offending party will actually go so far as to patch your security holes to protect their newfound property, without you ever knowing. This will then manifest itself in the form of malicious activity enacted on your behalf such as virus/malware distribution and further exploits, which Google takes very seriously.
Phishing Activity
If Google might have reason to confuse your site with a phishing scheme (such as one that aims to replicate another's login page to steal information), prepare for a world of hurt. For the most part, Google simply uses a blanket description of "illegal activity" and "things that could hurt our users", but in this interview, Matt specifically mentions their anti-phishing filter.
Outdated Content
A Google patent exists surrounding stale content, which is identified in a variety of ways. One such method for defining stale content basically just surrounds being old. What is unclear is whether this factor harms rankings on all queries, or simply when a particular search query is associated with something Google refers to as Query Deserves Freshness (QDF), which means exactly what it sounds like.
Orphan Pages
Orphan pages, or, pages of your site can't be found using your internal link architecture, may be penalized as Doorway Pages. At minimum, these pages do not benefit from internal PageRank, and therefore will suffer in rankings.
Sexually Explicit Content
While Google does index and return X-rated content, it's not available when their Safe Search feature is turned on, which is Google's default state. It's therefore reasonable to consider that unmoderated user-generated content or one-time content that inadvertently crosses a certain line may be blocked by the Safe Search filter.
Selling Links
Matt Cutts presents a case study where the toolbar PageRank of a domain decreased from seven to three as a direct result of outbound paid links. As a violation of Google's Webmaster Guidelines, it appears that directly selling links that pass PageRank can lead to penalties on both the on-page and off-page ends of a site.
Subdomain Usage (N)
Subdomains (thing.yoursite.com) are often viewed as separate websites by Google, as compared to subfolders (yoursite.com/thing/), which are not. This can be negative in a number of ways as it relates to other factors. One such scenario would involve a single, topical site with many subdomains, not benefiting from factors on this page that have "domain-wide" in their names.
Number of Subdomains
The number of subdomains on a site appears to be the most significant factor in determining whether subdomains are each treated as their own sites. Using an extremely large number of subdomains, although not a terribly easy thing to do by mistake, could theoretically cause Google to treat one site like many sites, or many sites like one site.
Source(s): Speculation
HTTP Status Code 4XX/5XX on Page
If your web server returns pretty much anything other than a status code of 200 (OK) or 301/302 (redirect), it is implying that the appropriate content was not displayed. Note that this can happen even if you are able to view the intended content yourself in your browser. In cases where content is actually missing, it's been clarified by Google that a 404 error is fine and actually expected.
Source(s): Speculation
Domain-wide Ratio of Error Pages
Presumably, the possibility for users to land on pages that return 4XX and 5XX HTTP errors is a sure mark of an overall low-quality website. We speculate this is a problem in addition to pages that are not indexed due to carrying such a HTTP header, and pages that include broken outbound links.
Source(s): Speculation
Code Errors on Page
Presumably, if a page is full of errors generated by PHP, Java, or other server-side language, it meets Google's definitions of a poor user experience and a low quality site. At absolute minimum, error messages within the page text likely interfere with Google's overall analysis of the text on the page.
Source(s): Speculation
Soft Error Pages
Google has repeatedly discouraged the use of "soft 404" pages or other soft error pages. These are basically error pages that still return HTTP code 200 in the document headers. Logically, this is difficult for Google to process correctly, and even though your users see an error page, Google (may at minimum) treat these as actual low-quality pages on your site, significantly lowering how the overall quality of your domain's content is scored.
Outbound Links
On some level, something known as "PageRank leakage" does exist: you only have so many "points" to distribute, and "points" that leave your site cannot circle immediately back. But Matt Cutts has confirmed that there are other controls that specifically reward some genuinely relevant and authoritative outbound links. Websites are meant to be intersections, not cul-de-sacs.
Sitemap Priority
Many theorize that the "priority" attribute assigned to individual pages in an XML sitemap has an impact on crawling and ranking. Much like other signals that you might hand to Google via Search Console, it seems unlikely that some pages would really rank higher just because you asked, and is mainly useful as a signal to de-prioritize lesser important content.
Sitemap ChangeFreq
The ChangeFreq variable in an XML sitemap is intended to indicate how often the content changes. It's theorized that Google may not re-crawl content faster than you tell it is changing. It's unclear however if Google actually follows this attribute or not, but if they do, it seems that it would yield a similar result as adjusting the crawl speed in Google Search Console.
Keyword-Stuffed Meta Description
It's theorized that, even though Google now tells us that they don't use meta descriptions in web ranking, only for ads, it may still be possible to send webspam signals to Google if there's an apparent attempt to abuse the tag.
Source(s): Speculation
Keyword-Stuffed Meta Keywords
Since 2009, Google has said that they don't look at meta keywords at all. Despite this, the tag is still widely abused by people who don't understand or believe that idea. It's theorized that because of the latter fact, this tag may yet serve to send webspam signals to Google.
Spammy User-Generated Content
Google should single out problems appearing in the user-generated portions of your site and issue very targeted penalties in such a context. This is one of few circumstances where a warning may appear in Google Search Console. We're told these penalties are usually limited to certain pages. We've found that WordPress trackback spam appearing in a hidden DIV is one way that this penalty can creep up undetected.
Foreign Language Non-Isolation
Obviously, if you write in a language that doesn't belong to your target audience, almost no positive, on-page factors can work their charm. Matt Cutts admits that improperly isolated foreign language content can be a stumbling point both for search spiders and for users. To not interfere with positive ranking factors, Google needs to be able to interrelate content on the page as well as sections of a site.
Auto-Translated Text
Using Babelfish or Google Translate to rapidly "internationalize" a site is a surprisingly frequent practice for something that Matt Cutts explicitly states is a violation of their Webmaster Guidelines. For those fluent in Google-speak, that usually means "it's not just a devaluation, it's a penalty, and probably a pretty bad one". In a Google Webmaster video, Matt categorizes machine translations as "auto-generated content".
Missing Robots.txt
As of 2016, Google Search Console advises site owners to add a robots.txt file to their site when one is missing. This has lead many to theorize that a missing robots.txt file is bad for rankings. We consider this is odd while Google Search's John Mueller advises removing robots.txt entirely when Googlebot is entirely welcome. We chalk this myth up to department miscommunication.
All nofollow
In an impressively inconclusive video, Matt Cutts tells us that Google "would like to see" sites like Wikipedia hand-selecting a few links to
not be "nofollow", but never states the value. The apparent ranking success of sites with 100% "nofollow" on their outbound links, like Wikipedia, seems to suggest that there's no significant harm done. If anything at all, they may lose some positive value attributed to
good outbound links.
Site Lacks Theme
One of the most popular case studies following Panda's launch was of HubPages, who ultimately repaired their damage by using
subdomains to isolate many unrelated sites from one. While the Hilltop update apparently began rewarding domains for having a core expertise in 2004, Panda apparently began punishing a lack thereof in 2011.
Weak SSL Ciphers
SSL encryption is confirmed as a positive factor. This suggests that Google wants to reward superior security for their users. So is it possible that Google is rewarding the quality of security as well? It would be incredibly easy for Google to test SSL ciphers - even easier than current, confirmed
malware tests. But at present, we have no evidence beyond it being a logical fit.
Source(s): Speculation
Commercial Queries (YMYL)
Google frequently uses the phrase "commercial queries" to refer searches related to a transaction. The Quality Rater Guidelines ask QA auditors to identify "Your Money or Your Life (YMYL)" content, discussing a heightened concern for legitimacy on searches related to money and health. It's not certain what the full impact of this is on the search algorithm, but a Google search for "commercial queries" can be found relating this concept to several other signals.
Comment Spam