Statistics Module
Tools to help you with analyzing and interpreting the uncertainty and variation in your reports.
Last updated
Tools to help you with analyzing and interpreting the uncertainty and variation in your reports.
Last updated
Very often when dealing with large datasets, you'll want to understand the certainty and variation in your reports. That is, uncover how confident you can be that your measurements are an accurate measurement of the entire population, and how repeatable your findings will be. We aim to provide high-level statistical outcomes, while still being approachable for non-statisticians.
Every report has a section in the table listed Audience Representation. This shows the percentage of users compared to the total audience size. In the table summary view, this is returned against the average daily active users, when the data table is expanded you will see daily audience representation.
For example, if your audience has 100,000 users and the audience representation on a given day is 1%, that means you have 1,000 active users. The purpose of this measurement is to ensure you're not unknowingly making assumptions on the performance of a large user group with a very small number of users.
Confidence measures the percentage a random sample of your audience will show as, or greater difference than the mean of the set baseline. You can read more about how to set and use confidence in the Measuring Confidence section.
1st std dev shading incorporates 68% of users; 2nd std dev shading incorporates 95% of users in an audience.
The 1st standard cohort deviation shows the dispersion of 68% of individual cohort LTVs closest to the mean. The 2nd standard cohort deviation shows the dispersion of 95% of individual cohort LTVs closest to the mean. Meaning, the shaded area represents 68% or 95% respectively of the users that fall closest to the mean.
Deviations are generally used to measure the amount of performance variability of your users as compared to the mean. High deviations mean audience values are generally far from the mean, while low deviations mean user values are clustered close to the mean. You can read more about how deviations can help with analysis here.
The standard cohort error of the mean is a measure of the variability of daily cohort means around the population (displayed) mean.
Since each user cohort's average (day of install) can differ from the overall mean, the standard error will show how much the cohort means differ from the population mean. The purpose of the standard cohort error of the mean is a way to know how close the average of cohort samples is to the average of the whole group. It is a way of knowing how precise the overall LTV average is in relation to an individual cohort's LTV. The smaller the standard error, the more representative a random day's LTV will be of the overall population. Conversely, a large standard error indicates less representation of an individual cohort's LTV to the population mean. For more information on standard error, please see this article.
What's the difference between the SD & SE?
The standard deviation (SD) measures the amount of variability from the individual cohorts to the mean, while the standard error of the mean (SE) measures how far a sample mean of the data is likely to be from the mean. The SE is always smaller than the SD. Source
We also allow you to apply and visualize statistics on forecasted LTVs. We apply the statistics calculation on the population mean then use the projection model chosen in the Forecasting Module. The idea is to show the projected dispersion of user values based on user value.