Inter-Rater Reliability Calculator
Calculate Cohen's Kappa (κ) for binary categorical data between two raters.
What is Inter-Rater Reliability?
Inter-rater reliability (IRR) is a statistical measure that quantifies the degree of agreement between two or more independent coders or observers. In research, simply calculating the percentage of agreement is often insufficient because it does not account for the agreement that occurs purely by chance. This is where Cohen's Kappa becomes essential.
How to Calculate Cohen's Kappa
Cohen's Kappa (κ) is used for categorical data where two raters classify items into mutually exclusive categories. The formula is:
κ = (Po – Pe) / (1 – Pe)
- Po (Observed Agreement): The proportion of items on which the raters actually agreed.
- Pe (Expected Agreement): The proportion of agreement expected by random chance based on the marginal totals.
Interpretation of Results
Based on the widely accepted scale by Landis and Koch (1977), here is how to interpret your Kappa score:
| Kappa Statistic | Strength of Agreement |
|---|---|
| < 0.00 | Poor (Less than chance) |
| 0.00 – 0.20 | Slight Agreement |
| 0.21 – 0.40 | Fair Agreement |
| 0.41 – 0.60 | Moderate Agreement |
| 0.61 – 0.80 | Substantial Agreement |
| 0.81 – 1.00 | Almost Perfect Agreement |
Real-World Example
Imagine two doctors diagnosing 100 patients for a specific condition (Yes/No). If they both agree "Yes" for 40 patients and "No" for 40 patients, but disagree on the remaining 20, their observed agreement is 80%. However, if one doctor says "Yes" 60% of the time and the other says "Yes" 50% of the time, the chance agreement (Pe) would be calculated to see if that 80% is actually impressive or just expected.