Ab Test Sample Size Calculator

A/B Test Sample Size Calculator body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; line-height: 1.6; margin: 0; padding: 20px; background-color: #f8f9fa; color: #333; } .ab-test-calc-container { max-width: 800px; margin: 20px auto; background-color: #fff; padding: 30px; border-radius: 8px; box-shadow: 0 2px 10px rgba(0, 0, 0, 0.1); } h1, h2 { color: #004a99; text-align: center; margin-bottom: 20px; } .input-group { margin-bottom: 20px; display: flex; flex-wrap: wrap; align-items: center; } .input-group label { flex: 1 1 150px; /* Allow labels to grow but not shrink below 150px */ margin-right: 15px; font-weight: bold; color: #004a99; } .input-group input[type="number"], .input-group select { flex: 1 1 200px; /* Allow inputs to grow but not shrink below 200px */ padding: 10px; border: 1px solid #ccc; border-radius: 4px; box-sizing: border-box; /* Include padding and border in the element's total width and height */ } .input-group span { margin-left: 10px; color: #666; font-size: 0.9em; } button { display: block; width: 100%; padding: 12px; background-color: #004a99; color: white; border: none; border-radius: 4px; font-size: 1.1em; cursor: pointer; transition: background-color 0.3s ease; margin-top: 20px; } button:hover { background-color: #003366; } #result { margin-top: 30px; padding: 20px; background-color: #e7f3ff; /* Light blue for emphasis */ border: 1px solid #004a99; border-radius: 4px; text-align: center; } #result h3 { color: #004a99; margin-top: 0; } #result-value { font-size: 2em; font-weight: bold; color: #28a745; /* Success green */ } .article-section { margin-top: 40px; padding: 25px; background-color: #f0f0f0; border-radius: 8px; border: 1px solid #ddd; } .article-section h2 { color: #004a99; text-align: left; margin-bottom: 15px; } .article-section p, .article-section ul { margin-bottom: 15px; } .article-section ul { list-style-type: disc; margin-left: 20px; } .article-section code { background-color: #e0e0e0; padding: 2px 5px; border-radius: 3px; font-family: Consolas, Monaco, 'Andale Mono', 'Ubuntu Mono', monospace; } @media (max-width: 600px) { .input-group { flex-direction: column; align-items: stretch; } .input-group label { margin-right: 0; margin-bottom: 8px; flex-basis: auto; } .input-group input[type="number"], .input-group select { flex-basis: auto; width: 100%; } .input-group span { margin-left: 0; margin-top: 5px; font-size: 0.85em; } }

A/B Test Sample Size Calculator

e.g., 10.5%
e.g., 1% or 20% relative
80% 90% 95% Probability of detecting an effect if it exists
5% 1% Probability of a Type I error (false positive)
e.g., 2 weeks

Required Sample Size Per Variation

Understanding A/B Test Sample Size

A/B testing, also known as split testing, is a method of comparing two versions of a webpage or app against each other to determine which one performs better. The goal is to understand how a variation affects a user's behavior, typically measured by a key performance indicator like conversion rate. A crucial aspect of running a statistically sound A/B test is determining the appropriate sample size.

Running an A/B test without adequate sample size can lead to unreliable results. You might miss a real, albeit small, improvement (a Type II error) or incorrectly conclude that a change had an effect when it didn't (a Type I error). This calculator helps you determine the minimum number of users you need to expose to each variation of your test to achieve statistically significant results.

Key Concepts:

  • Baseline Conversion Rate: This is the current conversion rate of your control (original) version. It's a historical performance metric that forms the foundation for your sample size calculation. A higher baseline conversion rate generally requires a smaller sample size for the same detectable difference.
  • Minimum Detectable Effect (MDE): This is the smallest improvement in conversion rate that you want to be able to confidently detect. It's often expressed as a relative percentage (e.g., a 20% increase over the baseline) or an absolute percentage point difference (e.g., a 1% increase from 10% to 11%). A smaller MDE requires a larger sample size.
  • Statistical Power (1 – Beta): This represents the probability of correctly detecting a true effect if it exists. A common standard is 80% or 90% power. Higher power means a lower chance of a Type II error (failing to detect a real effect). Achieving higher power requires a larger sample size.
  • Significance Level (Alpha): This is the probability of a Type I error – concluding that there is a difference when there isn't one (a false positive). Common values are 5% (0.05) or 1% (0.01). A lower significance level (higher confidence) requires a larger sample size.
  • Test Duration: While not directly used in the core statistical formula for sample size per variation, it's a practical consideration. Knowing your required sample size per variation and your expected daily/weekly traffic allows you to estimate how long your test will need to run to gather sufficient data.

The Math Behind the Calculator

The calculation of sample size for A/B tests often relies on formulas derived from statistical principles for comparing two proportions. A common approach involves using the normal approximation to the binomial distribution.

Let:

  • \(p_1\) be the baseline conversion rate (control group)
  • \(p_2\) be the expected conversion rate for the variation (i.e., \(p_1 \times (1 + \text{MDE relative})\) or \(p_1 + \text{MDE absolute}\))
  • \(P = (p_1 + p_2) / 2\) be the pooled proportion
  • \(Z_{\alpha/2}\) be the Z-score corresponding to the significance level (e.g., for alpha=0.05, \(Z_{0.025} \approx 1.96\))
  • \(Z_{\beta}\) be the Z-score corresponding to the statistical power (e.g., for 80% power, beta=0.20, \(Z_{0.20} \approx 0.84\); for 90% power, beta=0.10, \(Z_{0.10} \approx 1.28\))

The formula for the sample size per variation (n) is approximately:

n = ( (Zα/2 * sqrt(2 * P * (1-P))) + (Zβ * sqrt(p1 * (1-p1) + p2 * (1-p2))) )2 / (p1 - p2)2

This calculator simplifies the input for MDE by allowing either an absolute or relative percentage. The calculation then determines \(p_2\) based on the provided MDE and \(p_1\). The final result is the number of users needed for *each* variation (control and treatment).

When to Use This Calculator:

This calculator is invaluable for anyone planning an A/B test for:

  • Website landing pages
  • Product page designs
  • Call-to-action buttons
  • Email subject lines
  • App features
  • Marketing campaigns
  • Any scenario where you want to test variations of a user interface or experience to optimize conversion rates or other key metrics.

By ensuring you have adequate sample size, you increase the confidence in your A/B test results and make more informed decisions.

function calculateSampleSize() { var baselineCR = parseFloat(document.getElementById("baselineConversionRate").value); var mdeInput = parseFloat(document.getElementById("minimumDetectableEffect").value); var power = parseFloat(document.getElementById("statisticalPower").value); var alpha = parseFloat(document.getElementById("significanceLevel").value); var durationWeeks = parseInt(document.getElementById("testDuration").value); // Not used in core calc, but good for context // Input validation if (isNaN(baselineCR) || baselineCR = 100) { alert("Please enter a valid Baseline Conversion Rate between 0 and 100."); return; } if (isNaN(mdeInput) || mdeInput <= 0) { alert("Please enter a valid Minimum Detectable Effect (MDE) greater than 0."); return; } if (isNaN(durationWeeks) || durationWeeks <= 0) { alert("Please enter a valid Test Duration in weeks (greater than 0)."); return; } // Convert percentages to decimals var p1 = baselineCR / 100; var power_decimal = power; // power is already in decimal form from select var alpha_decimal = alpha; // alpha is already in decimal form from select // Determine if MDE is absolute or relative. A common heuristic is to check if it's a significant jump. // For simplicity, we'll assume if MDE < 5% it's likely an absolute difference, otherwise potentially relative. // A more robust approach might ask the user explicitly. // Here, let's assume user enters absolute percentage points if it's small, or relative percentage if it's large. // A better implementation would prompt user for relative/absolute. // For this example, we'll just calculate p2 based on the input. A common interpretation of MDE is absolute. // However, many tools interpret MDE as relative by default. Let's assume relative for a more common use case // and then add a note. var p2; if (mdeInput < 5) { // Heuristic: if MDE is small (e.g., < 5 percentage points), treat it as absolute. p2 = p1 + (mdeInput / 100); } else { // Otherwise, treat it as relative. p2 = p1 * (1 + (mdeInput / 100)); } // Ensure p2 does not exceed 1 p2 = Math.min(p2, 1); // Z-scores for significance level and power var zAlphaHalf = getZScore(alpha_decimal / 2); var zBeta = getZScore(1 – power_decimal); // Pooled proportion var P = (p1 + p2) / 2; // Sample size formula (per variation) var numerator = Math.pow(zAlphaHalf * Math.sqrt(2 * P * (1 – P)) + zBeta * Math.sqrt(p1 * (1 – p1) + p2 * (1 – p2)), 2); var denominator = Math.pow(p1 – p2, 2); if (denominator === 0) { alert("The baseline conversion rate and the target conversion rate are the same. Cannot calculate sample size."); return; } var n = numerator / denominator; // Round up to the nearest whole number var requiredSampleSizePerVariation = Math.ceil(n); // Calculate total sample size var totalSampleSize = requiredSampleSizePerVariation * 2; // Display results document.getElementById("result-value").innerText = requiredSampleSizePerVariation.toLocaleString(); var resultExplanation = `This is the minimum number of users needed for each variation (Control and Treatment).`; if (mdeInput < 5) { resultExplanation += ` We assumed the MDE of ${mdeInput}% was an absolute difference.`; } else { resultExplanation += ` We assumed the MDE of ${mdeInput}% was a relative difference.`; } resultExplanation += ` With a baseline conversion rate of ${baselineCR}%, a significance level of ${alpha * 100}%, and ${power * 100}% statistical power, this ensures you can detect a difference of at least ${mdeInput}% (as interpreted).`; document.getElementById("result-explanation").innerHTML = resultExplanation; document.getElementById("result").style.display = "block"; } // Helper function to get Z-score for common alpha levels. // This is a simplified lookup. For exact values, a statistical library or inverse CDF function would be used. function getZScore(probability) { if (probability === 0.025) return 1.96; // for alpha = 0.05 if (probability === 0.005) return 2.576; // for alpha = 0.01 if (probability === 0.10) return 1.282; // for power 0.90 (beta = 0.10) if (probability === 0.20) return 0.842; // for power 0.80 (beta = 0.20) if (probability === 0.05) return 1.645; // for power 0.95 (beta = 0.05) // Fallback for less common values, though these should cover most A/B testing scenarios // For a more precise calculation, one would use an inverse CDF function. console.warn("Using approximation for Z-score for probability: " + probability); if (probability 0.5) { // Approximate for higher probabilities using symmetry return getZScore(1 – probability); } return 0; // Default fallback } // Initialize with default values if available or for first load document.addEventListener('DOMContentLoaded', function() { // Set default values if inputs are empty, otherwise preserve user input if (document.getElementById("baselineConversionRate").value === "") document.getElementById("baselineConversionRate").value = "10.0"; if (document.getElementById("minimumDetectableEffect").value === "") document.getElementById("minimumDetectableEffect").value = "5"; // e.g., 5% relative if (document.getElementById("statisticalPower").value === "") document.getElementById("statisticalPower").value = "0.9"; if (document.getElementById("significanceLevel").value === "") document.getElementById("significanceLevel").value = "0.05"; if (document.getElementById("testDuration").value === "") document.getElementById("testDuration").value = "2"; });

Leave a Comment