How to Calculate Inter-rater Reliability

Inter-Rater Reliability Calculator (Cohen's Kappa) body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif; line-height: 1.6; color: #333; max-width: 800px; margin: 0 auto; padding: 20px; } .calculator-container { background-color: #f8f9fa; padding: 30px; border-radius: 8px; box-shadow: 0 2px 10px rgba(0,0,0,0.1); margin-bottom: 40px; } .calculator-title { text-align: center; color: #2c3e50; margin-bottom: 20px; } .input-grid { display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 10px; margin-bottom: 20px; text-align: center; } .grid-header { font-weight: bold; display: flex; align-items: center; justify-content: center; background-color: #e9ecef; padding: 10px; border-radius: 4px; } .input-group { display: flex; flex-direction: column; } .input-group label { font-size: 0.85em; margin-bottom: 5px; font-weight: 600; } .input-group input { padding: 10px; border: 1px solid #ced4da; border-radius: 4px; font-size: 16px; text-align: center; } .calc-btn { display: block; width: 100%; padding: 12px; background-color: #007bff; color: white; border: none; border-radius: 4px; font-size: 16px; cursor: pointer; transition: background 0.2s; font-weight: bold; } .calc-btn:hover { background-color: #0056b3; } #result-area { margin-top: 20px; padding: 20px; background-color: #fff; border: 1px solid #dee2e6; border-radius: 4px; display: none; } .result-metric { display: flex; justify-content: space-between; margin-bottom: 10px; border-bottom: 1px solid #eee; padding-bottom: 5px; } .result-metric:last-child { border-bottom: none; } .metric-label { font-weight: 600; color: #555; } .metric-value { font-weight: bold; color: #2c3e50; } .kappa-score { font-size: 1.5em; color: #28a745; text-align: center; margin: 15px 0; } .interpretation { text-align: center; font-style: italic; color: #666; margin-top: 5px; } article { background: #fff; padding: 20px; border-top: 1px solid #eee; } h2 { color: #2c3e50; border-bottom: 2px solid #007bff; padding-bottom: 10px; margin-top: 30px; } h3 { color: #495057; margin-top: 25px; } p, ul, ol { margin-bottom: 15px; } table { width: 100%; border-collapse: collapse; margin: 20px 0; } th, td { border: 1px solid #ddd; padding: 10px; text-align: left; } th { background-color: #f2f2f2; } .note { background-color: #fff3cd; padding: 10px; border-left: 5px solid #ffc107; margin: 15px 0; }

Inter-Rater Reliability Calculator
(Cohen's Kappa for 2 Raters)

Enter the number of observations for each agreement/disagreement scenario in the 2×2 contingency table below.

Rater 2 Says "Yes"
Rater 2 Says "No"
Rater 1 Says "Yes"
Rater 1 Says "No"

Total Observations (N): 0
Observed Agreement (Po): 0%
Expected Chance Agreement (Pe): 0%
function calculateKappa() { // 1. Get input values var a = parseFloat(document.getElementById('cell_a').value); var b = parseFloat(document.getElementById('cell_b').value); var c = parseFloat(document.getElementById('cell_c').value); var d = parseFloat(document.getElementById('cell_d').value); // 2. Validate inputs if (isNaN(a)) a = 0; if (isNaN(b)) b = 0; if (isNaN(c)) c = 0; if (isNaN(d)) d = 0; var total = a + b + c + d; var resultArea = document.getElementById('result-area'); // Edge case: No data entered if (total === 0) { alert("Please enter at least one observation greater than zero."); resultArea.style.display = 'none'; return; } // 3. Calculate Observed Agreement (Po) // Po = (a + d) / Total var observedAgreement = (a + d) / total; // 4. Calculate Expected Agreement (Pe) based on chance // Marginal Totals var rater1_yes = a + b; var rater1_no = c + d; var rater2_yes = a + c; var rater2_no = b + d; // Probability of Yes and No by chance var prob_yes = (rater1_yes / total) * (rater2_yes / total); var prob_no = (rater1_no / total) * (rater2_no / total); var expectedAgreement = prob_yes + prob_no; // 5. Calculate Cohen's Kappa (k) // k = (Po – Pe) / (1 – Pe) var kappa = 0; if (expectedAgreement === 1) { // Edge case: Perfect expected agreement (usually means uniform data), prevent divide by zero kappa = (observedAgreement === 1) ? 1 : 0; } else { kappa = (observedAgreement – expectedAgreement) / (1 – expectedAgreement); } // 6. Interpret the score (Landis & Koch, 1977) var interpretation = ""; if (kappa < 0) { interpretation = "Poor Agreement (Less than chance)"; } else if (kappa <= 0.20) { interpretation = "Slight Agreement"; } else if (kappa <= 0.40) { interpretation = "Fair Agreement"; } else if (kappa <= 0.60) { interpretation = "Moderate Agreement"; } else if (kappa <= 0.80) { interpretation = "Substantial Agreement"; } else { interpretation = "Almost Perfect Agreement"; } // 7. Update DOM document.getElementById('total-n').innerText = total; document.getElementById('po-val').innerText = (observedAgreement * 100).toFixed(2) + "%"; document.getElementById('pe-val').innerText = (expectedAgreement * 100).toFixed(2) + "%"; document.getElementById('kappa-display').innerText = "κ = " + kappa.toFixed(3); document.getElementById('interpretation-display').innerText = interpretation; resultArea.style.display = 'block'; }

How to Calculate Inter-Rater Reliability (Cohen's Kappa)

Inter-rater reliability (IRR) is a statistical measure used to assess the degree of agreement between different judges or raters. It gives a score of how much consensus there is in the ratings given by judges. It is useful in refining the tools given to human judges, for example by determining if a particular scale is appropriate for measuring a particular variable.

While a simple "Percent Agreement" calculation can show how often two people agreed, it fails to account for the possibility that raters guessed and agreed by random chance. This calculator uses Cohen's Kappa, which is considered a more robust measure as it takes into account the agreement occurring by chance.

Understanding the 2×2 Contingency Table

To calculate Kappa for two raters classifying items into two categories (e.g., "Yes/No" or "Present/Absent"), we organize the data into a grid:

  • Cell a: Both Rater 1 and Rater 2 said "Yes".
  • Cell b: Rater 1 said "Yes", but Rater 2 said "No".
  • Cell c: Rater 1 said "No", but Rater 2 said "Yes".
  • Cell d: Both Rater 1 and Rater 2 said "No".

The Formula

Cohen's Kappa ($\kappa$) is calculated using the following formula:

κ = ( Po – Pe ) / ( 1 – Pe )

Where:

  • Po (Observed Agreement): The actual proportion of times the raters agreed.
    Formula: $(a + d) / Total Observations$
  • Pe (Expected Agreement): The proportion of agreement expected by chance based on the marginal totals.

Step-by-Step Calculation Example

Imagine two psychiatrists are diagnosing 100 patients for a specific condition. Here are their results:

  • Both said "Positive": 45 (a)
  • Rater 1 "Positive", Rater 2 "Negative": 15 (b)
  • Rater 1 "Negative", Rater 2 "Positive": 5 (c)
  • Both said "Negative": 35 (d)

1. Calculate Total (N): 45 + 15 + 5 + 35 = 100.

2. Calculate Observed Agreement (Po):
(45 + 35) / 100 = 0.80

3. Calculate Expected Agreement (Pe):
First, find the probability of Rater 1 saying Yes (60/100) and Rater 2 saying Yes (50/100). Chance Yes = 0.6 * 0.5 = 0.3.
Next, find the probability of Rater 1 saying No (40/100) and Rater 2 saying No (50/100). Chance No = 0.4 * 0.5 = 0.2.
Pe = 0.3 + 0.2 = 0.50.

4. Calculate Kappa:
κ = (0.80 – 0.50) / (1 – 0.50) = 0.30 / 0.50 = 0.60.

Interpreting Your Kappa Score

According to Landis and Koch (1977), Kappa values are interpreted as follows:

Kappa Value (κ) Strength of Agreement
< 0.00 Poor (Less than chance)
0.00 — 0.20 Slight
0.21 — 0.40 Fair
0.41 — 0.60 Moderate
0.61 — 0.80 Substantial
0.81 — 1.00 Almost Perfect

Why is Inter-Rater Reliability Important?

In fields like psychology, medicine, and content moderation, data often relies on human judgment. If two doctors cannot agree on a diagnosis using the same criteria, the diagnostic method is flawed. Calculating IRR helps validate that the classification system is consistent and reliable, regardless of who is performing the assessment.

Leave a Comment