Picture this: it’s a boiling summer day in the north of Italy and I’m in a sweaty computer room taking the very last exam that separates me from the end of my education. Obviously, I left Statistics (‘for brain and cognitive science’) for last because I hate it with a passion. What’s this abstruse language called R, anyway? Why do I even care?
Little did I know that a few years after swearing I’d never touch Statistics (or R) again, I had to break my promise to show the impact of my work as an SEO professional to my stakeholders.
Yes, because you know that awesome feeling of having your changes implemented across several pages at once? It’s great, but in the corporate world it doesn’t happen very often – unless you make a case for it with a test and forecasted impact. You know, like with one of those pretty graphs you see on LinkedIn or Twitter, the ones that unequivocally display a trendline going up or down right after a change (or a ‘treatment’, as we’ll call it from now on) to a set of pages, showing the impact that it’s made on performance.
So what if I told you that you can create your own analysis and wow stakeholders with one of those graphs - for free?
Get ready, because in this article you will learn how to do it with Causal Impact, a data analytics package developed by Google that allows you to clearly display the effect of a change on a variable and also help you with business cases and decision making. And don’t worry, you don’t need to be a statistics or programming genius. Casual Impact on R is actually pretty accessible, and in this article you’ll find all you need to get started:
What is Causal Impact?
Why and when should you use Causal Impact?
How to run an analysis with Causal Impact on R Studio
The limits of statistics on the real-world data
So let’s dive right in.
Causal Impact is a package for R that allows us to analyse a time series dataset and draw inferences about the causal effect of an intervention on a variable, as compared to the predicted outcome in the absence of a change. It also determines if the effect is statistically significant or not.
It’s based on a statistical model called Bayesian Structural Time Series (or BSTS for short), which uses prior information to predict the outcome of a variable in the absence of the treatment. This prediction is called the counterfactual.
It then compares the expected result (the counterfactual) with the actual outcome, and uses the difference to estimate the effect of the treatment.
Let’s see an example in practice:
Let’s say that you’re running a simple title change test on your pages, like adding a price indication, with the hypothesis that it will improve clicks:
Once you apply the change, the data you’ll have at hand will most likely be from Google Search Console, Google Analytics or some other tracking tools, and while we can sometimes see a trend following our change on these platforms, it might be difficult to quantify exactly by how much our change affected the variable of interest (clicks, in this case)
What Causal Impact does in practice is to help overcome this challenge, by taking our data, analysing the performance from the period before the change was implemented and the performance after, to give you a graph that will make it very easy to understand the impact of the treatment you applied to your test group:
Example of a positive test analysed via Causal Impact
Here’s what each of the panels mean:
1) The first panel shows the original data along with the prediction in the absence of treatment:
a. The full black line is the
b. The horizontal dotted line represents the
c. The vertical dotted line represents the
d. The blue area around the lines is the fluctuation, the
2) In the second panel we can see the difference between the observed data and the counterfactual prediction. This is the
3) The third panel adds the point wise contributions from the second panel to calculate the cumulative effect of the change. This is your
The graph is great for visuals, but the real value is given by the detailed summary we obtain in addition:
If the ease of reporting has not convinced you yet to give it a go, let’s see a few more reasons to try it out.
Quantifying an effect and understanding its statistical significance is beneficial for a number of reasons in our job, but in particular to streamline decision making for scaling up our changes with confidence.
Running tests is difficult enough as it is, after all. When I was just starting out talking about tests, I ran a poll on Twitter and LinkedIn, asking what my peers found to be the most challenging part of running one: 12% said choosing their test group was the hardest part, 38% said they had trouble analysing and trusting the test results, 25% said they struggled applying their findings at scale, and 25% said something else, which included ‘resources, costs and business buy-in’ – all things that Causal Impact can help with.
(Bear in mind: I didn’t have a huge number of followers at the time, so take those numbers with a grain of salt, but nevertheless, I think the results are indicative of the number of challenges we run into when running tests).
So there’s three main reasons why I suggest you give Causal Impact a shot:
Whether you work inhouse or for an agency, your stakeholders will most likely ask for a forecast of the estimated upside before they spend money, time and resources on any initiatives, and as SEOs it can be tricky to provide clear answers.
However with Causal Impact, you can get rid of the blanket ‘it depends’ answer and provide a data-informed forecast with a degree of confidence based on the results you obtained, so that you can drive changes at scale.
You can use Causal Impact on a number of other datasets as long as they’re in a time series. For example, William Martin from UCL used it to estimate the effect of app changes on installs, and the hosts of the podcast DataSkeptic analysed the causal impact of Adele’s appearance at the Saturday Night Live on the visits to her Wikipedia page.
You can also try Causal Impact to measure the effect of influencer campaigns, and other types of initiatives without clear tracking to help you quantify results. (NB whilst you can Causal Impact on almost any time-series data, you might want to focus on upper funnel metrics like site views, sessions and users, since for deeper metrics, there are a number of layers that might influence performance that could be difficult to rule out.)
One of the biggest blockers for running tests is cost and resource. There are a few (great) tools around now that integrate Bayesian statistics to measure the impact on a split test and while they’re quicker to run, if budget is an issue, this is a great work around that allows you to get your testing under way.
Here are just a few of the scenarios where using Causal Impact can help us:
It can be used in any areas where
It’s great to clearly
It can help
And even when the test is a negative (or a ‘loser’) it can contribute to our learnings about our audience and help us direct our future efforts more strategically.
Hopefully I’ve now convinced you to try it yourself, so it’s time for a demo!
First, you’ll need to download R and then download RStudio. Both of them are free and open-source, and so you’ll find plenty of guidance online, and continuous improvements are made to the scripts by different contributors.
Once you’re done, open the software and you’ll see a screen that looks a bit like this:
On the right-hand side there’s a packages tab. Go to ‘Install’ and search Causal Impact. Once you’ve installed that, all the libraries you need will be right there, so you’re ready to use the platform with your dataset of choice.
For simplicity, I’m using sample data from Google Search Console here, but you can use any of your usual sources to extract the data of interest. If you don’t have access to data of your own, you can also create synthetic data to practise (Marco Giordano has a guide on how to do it).
Because the model needs enough time to reliably estimate the counterfactual (the prediction of the performance in the absence of a change) based on previous performance, you’ll need at least
Let’s take the example of a title change test that you want to analyse after the two-week mark: you’ll need to export 6 weeks’ worth of data (4 weeks pre-launch and 2 weeks post-launch) to allow the script to run correctly.
The download from GSC will look like this:
With the script I’m using, you’ll need to focus on one specific variable at a time. In this example, I’m looking at clicks, and since the GSC exports reports on additional metrics too, there’s a few things I need to do in order to get my dataset ready for analysis:
Correct the zeros: this is only viable if you have one or two instances in the dataset. In this case you can ‘borrow’ a unit from the row before or after, making sure it doesn’t overlap with your test launch date (as you’d be moving values across two different periods). This will allow the script to run with minimum data manipulation.
Choose a different metric of interest or test group: if you have multiple zeros, the test group likely is not a good fit for analysis yet. It’s not worth using that data to run the test, as the model won’t have enough figures to work with and correcting all of the instances would involve heavy data manipulation which defeats the purpose of having a reliable output in the plot.
Your end result should look like this:
The hard part is over. Once you’ve saved this as an Excel file, you’ll be ready to run the script!
Import your Excel file and check the preview to make sure all your data looks correct.
Once the data is loaded onto the main environment, find your test launch date (e.g. May 1st) so you can map the rows corresponding to your pre and post periods.
For simplicity, I’m going to use a sample that runs over 6 weeks (4 of pre-period, 2 of post-period) so you can follow along in the script. In my dataset, the launch date corresponds to row 29, so I know that my pre-period is from row 1-28 and my post-period is from row 29 until 42.
We’ll now move to the workspace in the bottom left of your R Studio environment, where we’re going to work through this script:
Start by typing out the first command to state where your pre and post periods start and end:
At this point, you can remove the date column, since you don’t need it anymore to map your pre and post launch periods. This will bring your test group column to the front, ready to be analysed and compared to the controls.
And now, it’s time to plot the impact and watch the magic happen!
Now that you have that pretty graph I promised you, the next command comes in handy to actually understand what it means, by quantifying the impact and the confidence level of the statistical inference.
And for a written summary of the values above that you can use in your reports, you can input the line below:
Finally, if you included one or more control groups in your analysis, you can use this command to understand the probability of them being used by the model, and how much of a good fit they were for it.
The closer the bar is to 1, the higher the chance that the control was used by the model as a predictor. A control closer to 0, is less informative for the model.
The colour indicates the direction of the coefficient:
white for negative (the coefficient goes in the opposite direction as your variable of interest)
black for positive (the coefficient goes in the same direction as your variable of interest)
grey for the same probability of being negative or positive.
While control groups are not mandatory to complete your analysis, I find them useful to strengthen the case for the impact your change makes (especially in those cases when the coefficient is closer to 1 and is negative, showing that the impact is more likely due to the treatment rather than the general trend of the website).
And that’s your first analysis done! Here’s a summary of all the commands we’ve seen (the fields in red will have to be replaced with your own data):
Bear in mind that whilst this script is a good place to start, you can find versions of Causal Impact scripts all over the web, so once you get comfortable with this one you can explore all of the other things that RStudio has to offer.
A few things I learnt from running this script over and over:
Similarly, the values in your column will often not be recognised if they contain commas (e.g. 1,204), so if you get an error make sure you replace the number format in your cells (e.g. 1204).
Start small, then expand (so you can start with a simple pre-post without control groups just to get comfortable with the analysis, and then you can add control groups, use more advanced scripts and so on).
Now that you learnt how to use Causal Impact, I need to warn you about one thing: statistics are not infallible. This quote summarises the dangers of using statistics recklessly to interpret real-world data:
“Statistics can be made to prove anything – even the truth.” - Noel Moynihan
With marketing tests, we’re not working in a sterile lab environment, where most variables are controlled – we’re working with real world data, and in the digital landscape a lot can (and will!) impact your data, so there’s a few things you need to be mindful of:
Google algorithm updates (official ones are now reported in Google Search Status Dashboard)
Tool tracking failures (both proprietary and third-party tools can have service outages)
Engineering releases that overlap with your test dates, which make it hard to isolate the impact of your change to just the variable of interest
My suggestion is to keep a working document with the rest of your team where you can map all of the known internal changes (dates of your tests, dates of other initiatives and engineering releases) as well as the external events that might affect your data. This will allow you to take into account any known factor during the timeframe of your test and isolate at your best the effect of your treatment on the variable you want to analyse.
Outliers can massively affect your data. Take this example:
This, in theory, is a great positive result. However, you can see very clearly from the first panel that there’s something not quite right with the data, and that this spike is likely caused by one or more outliers.
Outliers can originate from a number of instances, but most commonly:
New product launches within the test group
Holidays and seasonal events (think Black Friday, for example)
Results coming only from one page in the test group
Bots hitting your pages
And here’s a few things you can do to minimise the occurrence of them:
Example of the previous test once outlier values were removed from the dataset: it’s inconclusive.
We’ve all been there: you came up with the idea for the test, you feel strongly about your hypothesis and the change you’re implementing – therefore you really want the test to work out and prove your point. But what if it doesn’t? What if it’s an inconclusive – or worse – a loser?
Don’t despair. We can still learn from tests that don’t work out (I think that’s what my horoscope says every other day, at least). You have a couple of options in this case: you can let the test run a little longer, or repeat the test with bigger groups. Both of which allow for more data to be gathered to see an effect.
However, if after these tweaks the test is still inconclusive or negative, then you’ll just have to take your losses (and your learnings), revert the change and focus your efforts on other tests instead. That’s the reason they call them ‘test & learns’, after all.
And that’s about it!
I hope you find Causal Impact as useful as I have. Will it work for you? I’d say it depends, but I really don’t think it does.
Here’s some more useful resources you might like to check out: