80%
90%
95%
Probability of detecting an effect if it exists
5%
1%
Probability of a Type I error (false positive)
e.g., 2 weeks
Required Sample Size Per Variation
Understanding A/B Test Sample Size
A/B testing, also known as split testing, is a method of comparing two versions of a webpage or app against each other to determine which one performs better. The goal is to understand how a variation affects a user's behavior, typically measured by a key performance indicator like conversion rate. A crucial aspect of running a statistically sound A/B test is determining the appropriate sample size.
Running an A/B test without adequate sample size can lead to unreliable results. You might miss a real, albeit small, improvement (a Type II error) or incorrectly conclude that a change had an effect when it didn't (a Type I error). This calculator helps you determine the minimum number of users you need to expose to each variation of your test to achieve statistically significant results.
Key Concepts:
Baseline Conversion Rate: This is the current conversion rate of your control (original) version. It's a historical performance metric that forms the foundation for your sample size calculation. A higher baseline conversion rate generally requires a smaller sample size for the same detectable difference.
Minimum Detectable Effect (MDE): This is the smallest improvement in conversion rate that you want to be able to confidently detect. It's often expressed as a relative percentage (e.g., a 20% increase over the baseline) or an absolute percentage point difference (e.g., a 1% increase from 10% to 11%). A smaller MDE requires a larger sample size.
Statistical Power (1 – Beta): This represents the probability of correctly detecting a true effect if it exists. A common standard is 80% or 90% power. Higher power means a lower chance of a Type II error (failing to detect a real effect). Achieving higher power requires a larger sample size.
Significance Level (Alpha): This is the probability of a Type I error – concluding that there is a difference when there isn't one (a false positive). Common values are 5% (0.05) or 1% (0.01). A lower significance level (higher confidence) requires a larger sample size.
Test Duration: While not directly used in the core statistical formula for sample size per variation, it's a practical consideration. Knowing your required sample size per variation and your expected daily/weekly traffic allows you to estimate how long your test will need to run to gather sufficient data.
The Math Behind the Calculator
The calculation of sample size for A/B tests often relies on formulas derived from statistical principles for comparing two proportions. A common approach involves using the normal approximation to the binomial distribution.
Let:
\(p_1\) be the baseline conversion rate (control group)
\(p_2\) be the expected conversion rate for the variation (i.e., \(p_1 \times (1 + \text{MDE relative})\) or \(p_1 + \text{MDE absolute}\))
\(P = (p_1 + p_2) / 2\) be the pooled proportion
\(Z_{\alpha/2}\) be the Z-score corresponding to the significance level (e.g., for alpha=0.05, \(Z_{0.025} \approx 1.96\))
\(Z_{\beta}\) be the Z-score corresponding to the statistical power (e.g., for 80% power, beta=0.20, \(Z_{0.20} \approx 0.84\); for 90% power, beta=0.10, \(Z_{0.10} \approx 1.28\))
The formula for the sample size per variation (n) is approximately:
This calculator simplifies the input for MDE by allowing either an absolute or relative percentage. The calculation then determines \(p_2\) based on the provided MDE and \(p_1\). The final result is the number of users needed for *each* variation (control and treatment).
When to Use This Calculator:
This calculator is invaluable for anyone planning an A/B test for:
Website landing pages
Product page designs
Call-to-action buttons
Email subject lines
App features
Marketing campaigns
Any scenario where you want to test variations of a user interface or experience to optimize conversion rates or other key metrics.
By ensuring you have adequate sample size, you increase the confidence in your A/B test results and make more informed decisions.
function calculateSampleSize() {
var baselineCR = parseFloat(document.getElementById("baselineConversionRate").value);
var mdeInput = parseFloat(document.getElementById("minimumDetectableEffect").value);
var power = parseFloat(document.getElementById("statisticalPower").value);
var alpha = parseFloat(document.getElementById("significanceLevel").value);
var durationWeeks = parseInt(document.getElementById("testDuration").value); // Not used in core calc, but good for context
// Input validation
if (isNaN(baselineCR) || baselineCR = 100) {
alert("Please enter a valid Baseline Conversion Rate between 0 and 100.");
return;
}
if (isNaN(mdeInput) || mdeInput <= 0) {
alert("Please enter a valid Minimum Detectable Effect (MDE) greater than 0.");
return;
}
if (isNaN(durationWeeks) || durationWeeks <= 0) {
alert("Please enter a valid Test Duration in weeks (greater than 0).");
return;
}
// Convert percentages to decimals
var p1 = baselineCR / 100;
var power_decimal = power; // power is already in decimal form from select
var alpha_decimal = alpha; // alpha is already in decimal form from select
// Determine if MDE is absolute or relative. A common heuristic is to check if it's a significant jump.
// For simplicity, we'll assume if MDE < 5% it's likely an absolute difference, otherwise potentially relative.
// A more robust approach might ask the user explicitly.
// Here, let's assume user enters absolute percentage points if it's small, or relative percentage if it's large.
// A better implementation would prompt user for relative/absolute.
// For this example, we'll just calculate p2 based on the input. A common interpretation of MDE is absolute.
// However, many tools interpret MDE as relative by default. Let's assume relative for a more common use case
// and then add a note.
var p2;
if (mdeInput < 5) { // Heuristic: if MDE is small (e.g., < 5 percentage points), treat it as absolute.
p2 = p1 + (mdeInput / 100);
} else { // Otherwise, treat it as relative.
p2 = p1 * (1 + (mdeInput / 100));
}
// Ensure p2 does not exceed 1
p2 = Math.min(p2, 1);
// Z-scores for significance level and power
var zAlphaHalf = getZScore(alpha_decimal / 2);
var zBeta = getZScore(1 – power_decimal);
// Pooled proportion
var P = (p1 + p2) / 2;
// Sample size formula (per variation)
var numerator = Math.pow(zAlphaHalf * Math.sqrt(2 * P * (1 – P)) + zBeta * Math.sqrt(p1 * (1 – p1) + p2 * (1 – p2)), 2);
var denominator = Math.pow(p1 – p2, 2);
if (denominator === 0) {
alert("The baseline conversion rate and the target conversion rate are the same. Cannot calculate sample size.");
return;
}
var n = numerator / denominator;
// Round up to the nearest whole number
var requiredSampleSizePerVariation = Math.ceil(n);
// Calculate total sample size
var totalSampleSize = requiredSampleSizePerVariation * 2;
// Display results
document.getElementById("result-value").innerText = requiredSampleSizePerVariation.toLocaleString();
var resultExplanation = `This is the minimum number of users needed for each variation (Control and Treatment).`;
if (mdeInput < 5) {
resultExplanation += ` We assumed the MDE of ${mdeInput}% was an absolute difference.`;
} else {
resultExplanation += ` We assumed the MDE of ${mdeInput}% was a relative difference.`;
}
resultExplanation += ` With a baseline conversion rate of ${baselineCR}%, a significance level of ${alpha * 100}%, and ${power * 100}% statistical power, this ensures you can detect a difference of at least ${mdeInput}% (as interpreted).`;
document.getElementById("result-explanation").innerHTML = resultExplanation;
document.getElementById("result").style.display = "block";
}
// Helper function to get Z-score for common alpha levels.
// This is a simplified lookup. For exact values, a statistical library or inverse CDF function would be used.
function getZScore(probability) {
if (probability === 0.025) return 1.96; // for alpha = 0.05
if (probability === 0.005) return 2.576; // for alpha = 0.01
if (probability === 0.10) return 1.282; // for power 0.90 (beta = 0.10)
if (probability === 0.20) return 0.842; // for power 0.80 (beta = 0.20)
if (probability === 0.05) return 1.645; // for power 0.95 (beta = 0.05)
// Fallback for less common values, though these should cover most A/B testing scenarios
// For a more precise calculation, one would use an inverse CDF function.
console.warn("Using approximation for Z-score for probability: " + probability);
if (probability 0.5) {
// Approximate for higher probabilities using symmetry
return getZScore(1 – probability);
}
return 0; // Default fallback
}
// Initialize with default values if available or for first load
document.addEventListener('DOMContentLoaded', function() {
// Set default values if inputs are empty, otherwise preserve user input
if (document.getElementById("baselineConversionRate").value === "") document.getElementById("baselineConversionRate").value = "10.0";
if (document.getElementById("minimumDetectableEffect").value === "") document.getElementById("minimumDetectableEffect").value = "5"; // e.g., 5% relative
if (document.getElementById("statisticalPower").value === "") document.getElementById("statisticalPower").value = "0.9";
if (document.getElementById("significanceLevel").value === "") document.getElementById("significanceLevel").value = "0.05";
if (document.getElementById("testDuration").value === "") document.getElementById("testDuration").value = "2";
});