The customer plays an important role in every business and knowing the behavior of these customers can lead to meaningful insights for the business. One of the tools which have been long used to understand the behavior of the customer is cohort analysis. In this post, we will briefly walk through a cohort analysis example.
Let us begin by understanding what are Cohorts exactly. A cohort is simply a group of people having similar characteristics. These could either be their spending pattern or a particular date on which a group of people is on-boarded to a platform.
For example, consider any companies with large datasets like Uber or OLA, for these companies a cohort could be a group of people joining their platform on a particular day. Cohort analysis can come in handy to understand how good the business is in retaining people to their platform.
A typical data set for such analysis would be as shown below.
This dataset consists of a particular order Id the date of order charges and other specifications. The whole process of doing a cohort analysis can be broken down into the following steps.
- Determine the time interval of monitoring the cohort
- Determining the cohort group of users
- Prepare the data set with the cohort period
- Display the result
In this cohort analysis example, we will only be exploring one possibility, which is monitoring the retention of users. However, the same analysis could lead to multiple other insights such as monitoring the revenue over a period of time i.e to check the percentage of each cohorts revenue returning in subsequent periods.
Let us briefly take a look at each of these steps.
Table of Contents
Cohort analysis Example
1. Determine the time interval of monitoring a cohort
The cohort data in our cohort analysis example is of a particular food delivery store. Thus, in this case, let us monitor the cohort on a monthly basis. Using the data given above we create a period column (OrderPeriod). The data looks as follows after the first manipulation
2. Determine the cohort group of Users
The cohort, in this case, would be the number of users ordered in particular date. Determining it is very simple. All you have to do is a group by with the Orderdate and keep the year and the Month alone in the data. After such a transformation we have the following data.
3. Prepare the dataset with the cohort period
Now, all we have to do is to present the data in the required format before we can have insights into the same. The little mashups that we have to do to the data are as follows:
- Aggregate Users, Orders and amount spent by cohort group within one month.
- Label the cohort period.
This done the data will look something like this.
4. Display the result
Hang in there, we are almost there!
Now to see the user retention pattern we need to unstack the values of total users and plot the heat map of the resulting table. This will enable us to track the retention of users over a period of time.
Now that all the manipulations have been performed let us look at the resulting visualization and gain some insights.
The first thing that we can observe is that fewer users tend to purchase as time goes on.
The longest cohort is the cohort started on 2009-01 which stayed through fifteen months.
Some points to observe are we see sudden surges in an increase in the number of people in certain months for cohorts. A thorough investigation is needed in these cases to look at the reason for these surges. This may give insights on certain marketing activities that got more users to visit or other reasons.
These and multiple more visualizations can be made using various dashboard and reporting tools
Summary:
So far in this post, we have seen the steps taken to perform cohort analysis and the insights we can derive from them. We have only seen the retention of the users in this example.
Following are some of the areas where immediate insights can be drawn from cohort analysis
- Check if the new marketing campaign is generating customers: The essential part of every marketing campaign is to generate customers. One easy way to monitor if the campaign is converting leads to customers is to use cohort analysis. Here apart from looking at the date of order as in our example we should use the campaign column (if exists) along with it. To track the customers generated from a particular campaign over a period of time
- Seeing the sources that are driving more results: In this digital age, it is but common that multiple channels are used to market, and it is absolutely not necessary that all the channels generate an equal number of users. To check which channel not only generates but also retains users cohort analysis can be used and independent cohort graphs can be observed to generate the most customer generating channel.
- Decoding the purchase habits of customers: Often it is observed that all the customers are not from the same region or with the same frequency or volume of purchases. Also, they need not be of the same gender or belong to the same age group. So grouping customers by country in this case and measuring the total revenue for a particular country you can observe the purchase habits and learn the lifetime value of customers.
References:
1. You can find the data and code used to do this exercise here
2. A definitive guide to effective cohort analysis
Feel free to drop your questions and thoughts in the comments and we would be happy to help you out.
Cigin says
I believe you should definitely mention this article in your reference : http://www.gregreda.com/2015/08/23/cohort-analysis-with-python/.
Instead of the github link you have mentioned in your reference.
Thanks, Cigin
Pravin Singh says
Good resource, thanks for sharing!