In previous posts, we introduced pipe, the Automattic Machine Learning Pipeline. Part one introduced pipe and how it works. Part two made a case for internal ML pipelines and delved deeper into pipe’s technical infrastructure.
Upcoming pipe posts on data.blog will showcase how we use pipe at Automattic to inform decision-making and marketing efforts.
Marketing Science at Automattic
When the previous posts about pipe were published, Yanir and I were developing pipe under the Marketing Data team which was part of the Marketing division. Since then, we have returned to the Data division and formed a new team called Marketing Science with the mission to use science to empower marketing.
What is Marketing Science anyway?
My favorite description so far comes from the Marketing Science team at Facebook: Marketing Science transforms marketing efforts so that they are grounded in data and science.
My emphasis on that definition would be on the word transforms. Integrating data and scientific methods into decision making processes is not a straight forward or easy task. Everyone wants data and “truths” they can trust but short-term business goals and already established methods do not necessarily align with the long-term efforts required to get there. The path to making existing efforts grounded in data and science takes a lot of time and effort spent on making changes in objectives, tooling, established practices, and culture.
A case study from Marketing Science: Optimizing discount campaigns via ML
While we work to scale our tooling and align more closely with our stakeholders, we also run experiments to see the effect of different marketing interventions, analyze how they could be more successful, and use insights to improve results.
As part of our previous RR++: Increase Retention and Revenue project, we devised a small intervention.
The goal: Increase retention of and revenue from our paid plan users while making sure we only reach out to those users who would like to receive the email offer.
Method: Discount campaign via emails to Day 30 users. (Users who bought a plan 30 days ago.)
- A churn model built with our internal ML pipeline pipe that predicts the likelihood of a user going from a paid plan to a free plan;
- Two emails provided by the marketing team, one with a discount offer and one without a discount offer; and
- Data scientist and marketer collaboration.
The beauty of segmentation: sure things, lost causes, sleeping dogs, and persuadables
Normally, people think of email campaigns as a low or no-cost intervention. However, blindly emailing customers frequently would lose us loyalty and subscribers. Automattic’s Marketing team takes great care to “build our business sustainably through passionate and loyal customers.” This means that while the team would like to make sure that customers are informed and aware of offers, we also try and make sure that we do not email people too often.
90 percent of UK consumers have unsubscribed from communications from retailers in the past 12 months, with nearly half saying this is because they received too many messages from brands.
Furthermore, when you are emailing users with a discount offer, and someone who would take you up on it even at full price receives it and buys your product at a discounted price, you are losing revenue.
- The customers who will buy regardless of whether you send them an offer or not: the sure things.
- The customers who will refuse to buy with or without the offer. They have moved on! They are the lost causes.
- The customers who will actually react negatively to receiving an offer. They would have renewed a subscription or bought a product had you not the audacity to email them about it. We must let sleeping dogs lie.
- The persuadables — the holy grail of marketing segmentation, those users who are on the fence. They will only buy if you reach out to them.
Clearly, as a marketer, you don’t want to wake sleeping dogs and you also don’t want to waste money on treating customers that will never buy or that will always buy. This leaves you with only one interesting group to treat: The persuadables. These are the customers that show an increased propensity to buy if they are treated with a marketing activity.
Define success, measure effect, and increase impact
Ideally, data should be able to help identify persuadables: users who will only buy if they receive a discount. We defined success as being able to identify these users and make our campaign for increasing retention and revenue as efficient as possible.
An important thing to mention here is that our model predicts whether users will churn or not. It does not predict whether they will respond to the marketing effort or not. The true value of uplift modeling comes from being able to build models based on a user’s likelihood to churn or purchase as a response to a marketing campaign. We will write about our work on uplift modeling in upcoming installments.
But our hypothesis is that by segmenting users by their likelihood to churn and setting up well-designed experiments, we will be able to find those that would take us up on our offer even without a discount and exclude them from our targeting thus increasing incremental revenue per user.
The next step is to devise an experiment to measure success. Causal inference either via controlled or natural experiments is probably the single most important and highly used tool for a marketing scientist. Experimental design is not always intuitive and it takes domain knowledge and a lot of deliberate thinking to set up A/B tests that actually answer the right questions.
Our experiment setup: Segments
As mentioned, we have segments provided by our churn model. The churn model offers raw scores based on a user’s likelihood to churn and we can bucket these scores into quantiles to come up with segments. For example, the top 25% of users most likely to churn; the bottom 25%, which would be users least likely to churn..etc. If you have more users, you may want to bucket your users’ scores into deciles, instead, to get a more granular view.
We end up with four, non-overlapping segments to analyze, plus an overall segment which includes everyone so that we can see the impact of segmentation. (Maybe the churn scores provide sub-optimal segments and it’s better to send our offer to everyone without doing any segmentation. We’ll find out soon.)
For each of the segments, we put half the users in the control group and half in the test group. We will then see how the offer affects users differently based on which segment they are in.
Our experiment setup: test vs. control treatments
Another important aspect of experimental design is to determine an appropriate control group treatment for what we are trying to test. Traditionally, the A/B tests around discount campaigns send half the users an email with a discount. The other half do not receive anything.
However, this would be the wrong setup since we are interested in the impact of the discount itself. If the control group did not receive anything at all, we would still not understand whether our customers made a purchase because there was a discount or simply because they got nudged.
This is why for this experiment we came up with two emails:
- Test email: An offer for one of our plans along with a discount code.
- Control email: An offer for the same plan without the discount code.
Our experiment setup: Metrics
A note on incrementality: Let’s say you run a campaign that sends an email to 100 users and no email at all to a similar set of 100 users. You find that those that received an email spent $1,000 USD. So your net revenue is $1,000 USD for your campaign. Users not receiving the offer, however, spent $1,200 USD which means that you actually lost revenue by sending the email since your incremental revenue is -$200 USD.
It is an industry-wide standard to have net revenue as opposed to incremental revenue as one of the higher-level metrics. This makes sense in some scenarios. However, I have strong advice for every marketer in charge of campaigns: never even glimpse at net revenue numbers when evaluating campaigns.
When it comes to analyzing the results, we relied on incremental average revenue per user (ARPU) as our primary test metric for each of the segments.
We then multiply the ARPU by the number of eligible users in each segment so we get a crude estimate of the overall incremental revenue we would get on a monthly basis if we rolled the discount campaign out to the given segment.
So if the 95% confidence interval for the incremental ARPU for a segment is between $0.3 USD and $0.5 USD and we have 1,000 users in that segment, then we would expect that segment to bring in $300 USD – $500 USD in incremental revenue.
It is important to note that, for example, Quantile 4 (the top 25% of users most likely to churn) has only one-fourth of all users. It is a smaller segment than the no-segments group which has all the users. Which means that in certain cases you may find that one segment performs better in terms of ARPU but if that segment has very few users, then it may not be the segment that makes the most sense to target.
We first looked at this as if it were a simple test, so we looked at everyone without any segmentation. We found that the ARPU for the test was negative with 95% confidence intervals showing non-statistically significant results.
Normally, the test would be considered unsuccessful and stop here as the offer is potentially losing money. However, the segmentation made possible by the churn likelihood scores depicted a different picture. We found that not only is the offer successful for those that are deemed as most likely to churn, but only targeting this group also makes this campaign one of our most efficient ones.
We are able to maximize incremental revenue by only emailing half of our eligible users. Emailing all of them would potentially lose us money.
This is in line with our stated goal of increasing retention rates and revenue by only emailing those users who would potentially like to hear from us as opposed to targeting (and annoying) every possible customer.
- If we send a discount to everyone without segmentation we are potentially losing money.
- We would not have been able to see this had we kept the control group as an absolute holdout and not sent them an email without a discount.
- We can see that there are users here who respond really well to an email without a discount so we are potentially losing money by giving them a discount.
- Those that are less likely to churn are also not responding well to the discount, they may be just as likely to take us up on our simple offer.
- The one combination of segments where we see a statistically significant increase in revenue by offering a discount is the top 2 segments of users most likely to churn.
- These users only purchase if offered a discount.
- Here, neither Quantile 3 nor Quantile 4 showed statistically significant results individually but this is why it makes sense to look at segments that make sense to combine.
- We increase incremental revenue of the campaign while emailing only half of our eligible customers.
- As mentioned, if we were to look at net revenue, we would make the absolute wrong decision.
- If we rely on net revenue we could find that even campaigns that are demonstrably losing money can report very high net revenue numbers.
Once we have a successful A/B test result that is rooted in careful experimental design, we are ready to roll out the campaign. This campaign is constantly being monitored by leaving out a small group to make sure that it performs as well in the wild as it did during the testing phase.
Any such test or predictive model will only be as useful as the marketing domain expertise and collaboration that powers it. The data or model scores rely on strong marketing strategies and continually testing carefully crafted offers to strengthen them. The experiment was also a good way to build and strengthen relationships between our data-driven marketers and data scientists.
Future impact comes from being able to iterate on results. This means that we can now work even more closely with our marketing stakeholders to widen our net and start playing around with a family of offers and machine learning models that can pick the right offer for each user. While Quantiles 1 and 2 in our small experiment may not have been the right target group for a discount offer, they may turn out to respond well to different campaigns.
In the next installments on pipe, we will look into larger campaigns; discuss our secondary test metrics, show how likelihood scores can end up providing better segments for campaign optimization than heuristics-based options (e.g., “users that added an item to their cart,” “users that logged in in the past N days,”..etc.), and how machine learning can help pick which of many different offers to send to each user — or maybe no offer at all — without overwhelming them.