"Black Hat" is Still Working

on under Search Engine Optimization.

MANDATORY DISCLAIMER: If you want to build a long term online presence, "black hat" isn't the way to do it.

With that out of the way, there's a surprising amount of ignorance in this industry about the world of "black hat" SEO, and a completely false belief that in the post-panda/penguin world, it's somehow impossible to rank using it. Then we run into case studies like these, or these, which demonstrate that it is, in fact, possible for sites to rank for mega-competitive keywords on top of obvious spam links. And to do it again, and again.

I always find the comment sections on these kinds of posts amusing. The sheer shock and horror that Google has failed them as the guardian of all things pure and noble is laughable.

Here's the thing: Google's still just a machine. With all the machine learning and server farms, it still comes down to that. And for everybody who thinks they know exactly what Google needs to do in order to mop up messes like these, let me give you a short lesson in conditional probability, and the notion of the false positive.

Here's a hypothetical:

  • Google develops a machine learning algorithm that accurately diagnoses spam sites 99 percent of the time, and only issues a false positive 1 percent of the time
  • 0.5 percent of the sites on the web are spam

Intuitively, we want to believe that this means nearly all of the spam sites would be eliminated, and only a few innocent sites would be hit. But it doesn't work that way.

What this really means is that if you have a spam site, this new algorithm will catch it 99 percent of the time. But it also means that 1 percent of all innocent sites would be hit. Assuming only 0.5 percent of the sites on the web are spam, the algorithm would wipe out over twice as many innocent sites as spam sites.

And the numbers I've used here are unrealistically optimistic in Google's favor.

Most people don't understand just how truly difficult the job of an anti-spam Google engineer really is. They need to be able to identify factors that almost zero innocent sites have. That is extremely difficult. To avoid wiping out innocent sites, they have to focus on factors that will only wipe out a proportion of the spam sites on the web, except in the obvious cases, like 100 percent exact anchor text links.

This is why the impact of Penguin 2.0 was so small compared to the first one.

These commonly proposed solutions will not work for Google:

  • Switch to social signals - While Google will certainly incorporate social signals, and may have already, it will never switch over. Most niches don't lend themselves to social sharing. Wiping out all the sites that have never picked up social attention is a horrible idea. This would hit way more innocent sites than spam sites.
  • Ignore overnight link profiles - These happen all the time to legitimate sites. It's pretty much the definition of viral content.
  • Knock out pages with no content - Until Google learns to understand images and videos, it can't wipe out pages just because there is little or no text. Panda used machine learning algorithms to detect content that displayed an "Ezine articles" writing style, and that's about it.

Don't misunderstand me. Google can devalue signals, and it can use combinations of negative factors to reduce collateral damage, but this will not eliminate spam sites from the web. It will only remove a portion of them. That's the nature of this battle. Any method that would wipe out all spam sites would be akin to dropping a hydrogen bomb on the web.

We advise against black hat tactics because they're risky, not because they guarantee failure. There are no guarantees in SEO. That's why SEO for SEO's sake is a bad idea under any circumstances. If you're not using it to build a business, you're going to lose.

Image credit: Patty Maher