Inter-Rater Reliability Calculator (Cohen's Kappa for 2 Raters)
Enter the number of observations for each agreement/disagreement scenario in the 2×2 contingency table below.
Rater 2 Says "Yes"
Rater 2 Says "No"
Rater 1 Says "Yes"
Rater 1 Says "No"
—
Total Observations (N):0
Observed Agreement (Po):0%
Expected Chance Agreement (Pe):0%
function calculateKappa() {
// 1. Get input values
var a = parseFloat(document.getElementById('cell_a').value);
var b = parseFloat(document.getElementById('cell_b').value);
var c = parseFloat(document.getElementById('cell_c').value);
var d = parseFloat(document.getElementById('cell_d').value);
// 2. Validate inputs
if (isNaN(a)) a = 0;
if (isNaN(b)) b = 0;
if (isNaN(c)) c = 0;
if (isNaN(d)) d = 0;
var total = a + b + c + d;
var resultArea = document.getElementById('result-area');
// Edge case: No data entered
if (total === 0) {
alert("Please enter at least one observation greater than zero.");
resultArea.style.display = 'none';
return;
}
// 3. Calculate Observed Agreement (Po)
// Po = (a + d) / Total
var observedAgreement = (a + d) / total;
// 4. Calculate Expected Agreement (Pe) based on chance
// Marginal Totals
var rater1_yes = a + b;
var rater1_no = c + d;
var rater2_yes = a + c;
var rater2_no = b + d;
// Probability of Yes and No by chance
var prob_yes = (rater1_yes / total) * (rater2_yes / total);
var prob_no = (rater1_no / total) * (rater2_no / total);
var expectedAgreement = prob_yes + prob_no;
// 5. Calculate Cohen's Kappa (k)
// k = (Po – Pe) / (1 – Pe)
var kappa = 0;
if (expectedAgreement === 1) {
// Edge case: Perfect expected agreement (usually means uniform data), prevent divide by zero
kappa = (observedAgreement === 1) ? 1 : 0;
} else {
kappa = (observedAgreement – expectedAgreement) / (1 – expectedAgreement);
}
// 6. Interpret the score (Landis & Koch, 1977)
var interpretation = "";
if (kappa < 0) {
interpretation = "Poor Agreement (Less than chance)";
} else if (kappa <= 0.20) {
interpretation = "Slight Agreement";
} else if (kappa <= 0.40) {
interpretation = "Fair Agreement";
} else if (kappa <= 0.60) {
interpretation = "Moderate Agreement";
} else if (kappa <= 0.80) {
interpretation = "Substantial Agreement";
} else {
interpretation = "Almost Perfect Agreement";
}
// 7. Update DOM
document.getElementById('total-n').innerText = total;
document.getElementById('po-val').innerText = (observedAgreement * 100).toFixed(2) + "%";
document.getElementById('pe-val').innerText = (expectedAgreement * 100).toFixed(2) + "%";
document.getElementById('kappa-display').innerText = "κ = " + kappa.toFixed(3);
document.getElementById('interpretation-display').innerText = interpretation;
resultArea.style.display = 'block';
}
How to Calculate Inter-Rater Reliability (Cohen's Kappa)
Inter-rater reliability (IRR) is a statistical measure used to assess the degree of agreement between different judges or raters. It gives a score of how much consensus there is in the ratings given by judges. It is useful in refining the tools given to human judges, for example by determining if a particular scale is appropriate for measuring a particular variable.
While a simple "Percent Agreement" calculation can show how often two people agreed, it fails to account for the possibility that raters guessed and agreed by random chance. This calculator uses Cohen's Kappa, which is considered a more robust measure as it takes into account the agreement occurring by chance.
Understanding the 2×2 Contingency Table
To calculate Kappa for two raters classifying items into two categories (e.g., "Yes/No" or "Present/Absent"), we organize the data into a grid:
Cell a: Both Rater 1 and Rater 2 said "Yes".
Cell b: Rater 1 said "Yes", but Rater 2 said "No".
Cell c: Rater 1 said "No", but Rater 2 said "Yes".
Cell d: Both Rater 1 and Rater 2 said "No".
The Formula
Cohen's Kappa ($\kappa$) is calculated using the following formula:
κ = ( Po – Pe ) / ( 1 – Pe )
Where:
Po (Observed Agreement): The actual proportion of times the raters agreed. Formula: $(a + d) / Total Observations$
Pe (Expected Agreement): The proportion of agreement expected by chance based on the marginal totals.
Step-by-Step Calculation Example
Imagine two psychiatrists are diagnosing 100 patients for a specific condition. Here are their results:
3. Calculate Expected Agreement (Pe):
First, find the probability of Rater 1 saying Yes (60/100) and Rater 2 saying Yes (50/100). Chance Yes = 0.6 * 0.5 = 0.3.
Next, find the probability of Rater 1 saying No (40/100) and Rater 2 saying No (50/100). Chance No = 0.4 * 0.5 = 0.2.
Pe = 0.3 + 0.2 = 0.50.
According to Landis and Koch (1977), Kappa values are interpreted as follows:
Kappa Value (κ)
Strength of Agreement
< 0.00
Poor (Less than chance)
0.00 — 0.20
Slight
0.21 — 0.40
Fair
0.41 — 0.60
Moderate
0.61 — 0.80
Substantial
0.81 — 1.00
Almost Perfect
Why is Inter-Rater Reliability Important?
In fields like psychology, medicine, and content moderation, data often relies on human judgment. If two doctors cannot agree on a diagnosis using the same criteria, the diagnostic method is flawed. Calculating IRR helps validate that the classification system is consistent and reliable, regardless of who is performing the assessment.