Why Rand Fishkin Is Wrong About Correlation

on under Search Engine Optimization.

that would be great

...Okay, here we go...

Look, I don't like to pick internet fights for attention. I respect Rand Fishkin's opinions on a lot of things in the industry. I use Moz's little title tag tool practically every day. I've published posts at Moz. But there are times when an idea gets perpetuated and it just has to be put to a stop. What follows isn't about Rand, or Moz. Heck, in my opinion, it's also much, much bigger than a misconception in the SEO community.

So, Rand just published this blog post today, and in it he says this:

Today I'm going to make a crazy claim--that in modern SEO, there are times, situations, and types of analyses where correlation is actually MORE interesting and useful than causality.

Alright. A lot of interesting things come from correlation studies. They are a good jumping off point. They can tell us where causality might exist. But he is so very, very dead wrong about correlation ever being more interesting or useful than causality.

To give Rand his fair treatment, let me summarize the point of his blog post: SEOs should sometimes spend more time thinking about what Google is trying to rank, rather than specifically how. I want to be 100% clear before I move on: I don't disagree with that point. What I disagree with is what appears to be a fundamental misunderstanding of the differences between causation and correlation. I'm not just throwing an academic conniption fit here, either, and here's why.

The Misunderstanding That Follows Can Be Dangerous

The central problem is this: correlation studies are a poor way to guess what kinds of things Google wants to rank.

Here's something Rand said in the post that betrays his fundamental misunderstanding:

If many high-ranking sites in your field are offering mobile apps for Android and iOS, you may be tempted to think there's no point to considering an app-strategy just for SEO because, obviously, having an app doesn't make Google rank your site any higher. But what if those mobile apps are leading to more press coverage for those competitors, and more links to their site, and more direct visits to their webpages from those apps, and more search queries that include their brand names, and a hundred other things that Google maybe IS counting directly in their algorithm? [Emphasis mine]

Here's what Rand seems to miss. If putting an app up causes you to get more press, which causes you to get more links, and causes more people to visit your site, and causes people to search for your brand, and those actions in turn cause Google's algorithm to improve your rankings, this is a cause and effect relationship. 

What Rand describes here is not mere correlation, it's  indirect causation.

And he made this mistake more than once:

...what if those TV ads drive searches and clicks, which could lead directly to rankings?

Again, this is not just correlation, it's indirect causation.

So, what's the big deal? Isn't this all just a semantic debate? Don't I agree with him that we shouldn't obsess so much with figuring out how Google's algorithm works, since that's actually impossible? Yes? So what's the problem?

Here's The Problem (2 Big Problems, Actually)

1. When we confuse correlations with indirect causes, we stop caring whether our actions matter

Let me just come out and say this: the SEO industry will never unravel the increasingly complex mysteries of Google's algorithm.

That doesn't mean we can't identify things we can do that tend to improve a client's visibility in the search results.

The key is to focus on indirect causes.

In psychology, nobody pretends that they are ever going to discover a perfect therapy that will cure 100% of patients of depression. However, they recognize that their are things they can do that will be helpful for a certain percentage of their patients. They can conduct scientifically designed experiments, with control groups and everything, that demonstrate whether one particular therapy works better than another on average.

This is the power of indirect cause. Psychologists need not understand everything about how an ever changing human brain works to discover that certain therapies work better than others.

SEO is similar. We may not have clean control groups, but we can measure impact. We can identify strategies that seem to work better than others. We can test.

Correlation studies are an easy out. They let us stop caring whether what we do actually works, and instead just copy what successful people have done. We don't care why those people were successful, and we don't ask ourselves whether it makes sense to copy them, and we don't ask whether the things they did actually contributed to their success in the first place.

At the end of the day, those correlation studies are what really convinced people that they needed to flood their site with spammy links. When some SEOs saw correlation studies showing how highly links were correlated with rankings, they spammed their sites. In some cases, they actually thought that's what Google wanted, because the correlation was there.

2. When we confuse correlations with indirect causes, we screw up cause and effect

This is a big one too. Let me give you some examples.

  • The correlation studies almost certainly overvalue the importance of links. Why? Because a site that ranks well in search results is also going to get linked to more often than a site which doesn't rank well. I'm not saying links don't help your rankings: of course they do. But rankings also cause people to link to you, so the strength of links as a factor is overrated when you just look at correlation.

fry rankings

  • We already know that Google doesn't use social media factors in the algorithm, but the correlation with rankings is high. Why? At least in part, it's because high rankings will cause more people to see your content, and some of them will share it on social networks. To some extent, it may also be because a social media presence helps create signals and indirectly causes your rankings to improve. It's dangerous, though, to rush to the conclusion that because this correlation exists, Google probably wants to rank the kind of site that will do well on social media. In fact, I think that's dead wrong, because Google loves Wikipedia and academic sites, and they're definitely not social.

We don't just get cause and effect backwards, either. I can guarantee that a lot of the factors we think are important only correlate with rankings because marketers are using them.

If marketers are using strategy X, and non-marketers are not using strategy X, and marketers tend to rank better than non-marketers, we will see strategy X correlate with rankings. In that case, all it means is that marketers use strategy X. It's not necessarily an indirect cause for or an indirect effect of rankings.

Cause and effect, indirect or otherwise, are always more important than correlation.

Postscript: Michael Martinez also wrote a good post on the topic.

  • goodkarma23

    Interesting thought experiment, Carter. I'd love to hear Rand's thoughts on it...he's a super smart guy, and I'd expect some fascinating data to defend his positioning..

    • Rand is a super smart guy and a great marketer, but he's just factually wrong here. There's no data to defend his position because the examples he gives conflates an indirect cause-and-effect relationship with correlation.

      His article should have been entitled "Why SEOs Should Care About Indirect Causation (Not Correlation)", then it would have been spot on.

      • Carter Bowles

        Thanks for really nailing what bothered me about the article so concisely.

        • No, thanks for your write up, I think its a great breakdown of the same issues I had with the article, and you touched on some important points.

          In a way its a really subtle critique because I think we both agree with Rand's premise - trying too hard to figure out exactly what is going on in Google's black box can lead to claiming a cause and effect relationships when all we really have to go on is correlation.

          At the same time, its not just a semantic distinction, its actually a very important one. People are going to use correlational data to draw conclusions about cause and effect (as rand did in his examples), whether they're doing it consciously or not - its a hard wired heuristic we default to when we don't have enough data. But at least when you're aware of it, you can question your assumptions and guard against the dangers of equating correlation with causation.

  • Jeff Thompson

    Interesting read. I think what maybe Rand was driving at though is that sometimes people use correlation is not causation as an excuse not to find out whether there is a causative effect (direct or otherwise.)

    To take the app example you highlight I don't think Rand is mistaking correlation for indirect causation.

    The correlation occurs when Company A looks at the data and says - hmmmmm, 75% of our rivals who have an app outrank us for key terms (forgive me, it's a simplistic example), maybe we should have an app.

    There's no causation (direct or indirect) because, really, they don't have the data to prove whether their better ranking rivals are outranking them because of an app (sure, they can get some of it, but it's not easy and they're never really going to get the complete info they need.)

    Now at this stage someone could come along and say "yeah, but correlation isn't causation, we've got no proof that an app is even a contributing factor to us being outranked."

    If the company decides not to build an app based on that statement, then they'll never know if it could have indirectly improved their rankings.

    If they do build an app, and track things properly, and it works then they'll have enough information to say that the app had an indirect causative impact on their rankings. (And if it doesn't work, they'll still have evidence of a correlation but no proof that there isn't a indirect causative impact for their rivals: Hey, maybe Company A just made a sucky app.)

    SEO correlation studies are bad, but they're even worse when they're used as a roadblock to stop people implementing ideas that will actually work. In my opinion, that's what Rand was saying.

    (As a final note, I think it's wrong to say, as you do, that SEO correlation studies overvalue the importance of links. They may overvalue the importance of the number of links, but that's different. Don't let all the talk of semantic search and machine learning fool you. If Google could dampen down quality links as a ranking factor to any noticeable degree, they wouldn't be taking action on such a granular level with things like anchor text.)

    • Carter Bowles

      "They may overvalue the importance of the number of links, but that's different."

      That's what I meant :)

      I don't know whether or not what you're saying is what Rand was getting at, but yeah, I agree. Like I said, correlation is an important jumping off point.

  • Kristine S

    Correct there is no causation in SEO as you cannot gather the variables or utilize the methods necessary for causation. However, the idea mentioned here that correlation makes us not care or is not incredibly helpful is incorrect.

    Entire fields of study (Psychology is an example that you used, so is Sociology) however are based on correlation. So while I appreciate your passion and do agree Rand got it wrong, so did this article in that regard. Correlation can tell us a great deal about how variables work with each other and what is most likely affecting the other to varying degrees of confidence. Methods of statistical analysis such as crosstabs, linear regression and multiple regression are the mathematical basis for this type study. However, you do have to make sure to adhere to best practices such as good variable definition, proper data collection and data sample. Then you need to understand the limits of each type along with other factors related to proper research methodologies.

    Noting a good course in statistics that covers this type of data analysis is the most helpful.

    Note "indirectly" causal is correlation in a round about way.

    • Carter Bowles

      There's nothing here I would disagree with, and I had hoped that the article was clear about correlation studies being important as a jumping off point (and sometimes the only thing you can do). My point is that cause and effect are always more interesting and meaningful.

      I'm glad you brought up regression analysis, because it would be great if some of these studies started using it to control for variables, so we could at least rule out some of the more pointless spurious correlations that we can be sure are confounding these studies.

      Something you mentioned also reminded me of another issue I forgot to address in this post. The correlation studies also suffer from a fundamental issue, their sample sizes can't be a representative sample of Google search queries, because only Google has that data.

  • Great post @carter, in fact it was shared on Inbound and @randfish has replied if you want to check it out, here is the link- http://inbound.org/articles/view/why-rand-fishkin-is-very-very-wrong-about-correlation

    • Carter Bowles

      Yep, I've already hopped into the discussion. Thanks for letting our readers know.

  • Grant Simmons

    Testing is important.
    Leveraging data, knowledge & gut to model tests.
    Defining success, failure or elements for more testing.
    As long as you know the difference in correlation & causation.
    And then test the crap out of it. I'm good with that.

  • Chase Anderson

    Best post I've read this year.

    Thank you!

  • Just one question Carter :) What is better source of information than case studies? The problem is how trustworthy a source could be and how we can use it in our particular case.

    • Carter Bowles

      The most reliable information is your own data, since you know it hasn't been cherry picked, combined with the quasi-scientific mindset of trying to prove yourself wrong, not right.

      • Sure thing but one must learn from other people's mistakes as well as you won't have the time to perform all the test by yourself.

        All the best

  • Pauline

    When I first read Rand article, as a French, I thought correlation was a combination of multiple facts... that would have be right then, isn't it ?

  • MatthewWoodward

    He also said negative SEO didn't exist.