Opinions

Estimating the Effectiveness of Advertising on Revenue Using Causality

Posted by Gijs Joost Brouwer on September 3rd, 2014 at 11:46 am

Did you know that ice cream causes fires? True story. We crunched the numbers and concluded that on days when people eat a lot more ice cream, there are a lot more fires. We thought that this was a very nifty discovery until somebody pointed out that they thought fires actually cause people to eat ice cream. 'Nonsense!' we proclaimed, but ran the numbers anyway. Lo and behold, we found that when there are a lot of fires, people tend to eat more ice cream.

Let's be honest. We know ice cream does not cause fires (it doesn't run around mischievously with a box of matches) and fires do not make people eat ice cream (it usually makes them call 911). So what is going on? Well, the observed correlation between fires and ice cream is actually caused by something that we did not observe yet. And that thing is of course the weather: in the summer people have more ice cream AND there are a lot more fires. The take home message: just because two things correlate doesn't necessarily mean that one causes the other to happen. Something else entirely could be causing both things to happen. Or as statisticians like to say: 'correlation does not imply causation.'

Well, what does any of this have to do with digital advertising? A lot as it turns out. Especially when it comes to estimating the effectiveness of advertisement on revenue. This is occasionally done using what is called an A/B test. Consumers are randomly selected to either receive the advertisement or not. This random selection (flipping an imaginary coin to decide) is crucial. Why? Well, let us think of another way of deciding who gets an ad. One thing we could do is to show the ad only to men. Now imagine we find that the group that received the ad actually purchased the associated product more often. Was this because they saw the ad, or because they were men? You can't tell! The fact that only men saw the ad is biasing your estimate of the effectiveness of the ad.

You might think that deciding who receives an ad on gender or any other user characteristic is a rather silly, artificial situation — far from it. Advertisers have created sophisticated ways of targeting consumers that inadvertently create these biases. For example, a consumer who visits a lot of video game related content will probably be receptive to an ad for an upcoming video game release. But hold on. Isn't it safe to assume that the same consumer (visiting a lot of video game related content) already has a higher probability of buying that new video game, without ever seeing the ad? Definitely. An interest in video games becomes what we call a confounder: it increases the probability that advertisers show you a video games ad as well as increases the probability you buy said video games. So, when that consumer ends up actually buying the video game, we are left unable to attribute that to either the pre-existing interest or the ad. Put more bluntly: we don't know whether the ad has any effect on consumer behavior.

Specific targeting is ubiquitous in today's modern digital advertising. That means most (if not all) data collected during an advertisement campaign (when we are not running an A/B test) is likely to be extremely confounded. But not all is lost. A new and exciting statistical method is emerging, developed especially to tackle these biased data sets: the causality framework. The amazing thing about this methodology is that it controls for biases by building something akin a statistical time machine. For every purchase that was preceded by an advertisement, we can replay time and ask the question: would that purchase still have happened if the advertisement was not shown? If the purchase still would have happened, the ad and purchase merely correlated. On the other hand: if the purchase would not have happened when we remove the ad, we know that ad had a causal effect on the purchase: it increased the probability of a purchase. This creates a cutting edge, actionable metric that allows advertisers to determine which ads have a causal, not just correlational, relationship with future purchase behavior.

Leave a comment