Inter-Rater Reliability Calculator
Calculate Cohen's Kappa for categorical agreement between two raters.
Understanding Inter-Rater Reliability
Inter-rater reliability (IRR) is a statistical measure that quantifies the degree of agreement between different observers or raters who are assessing the same phenomenon. While "percent agreement" is a simple way to look at consistency, it can be misleading because it doesn't account for the agreement that might happen purely by chance.
What is Cohen's Kappa?
Cohen's Kappa (κ) is the gold standard for measuring inter-rater reliability for categorical data. It adjusts the observed agreement by subtracting the probability of random agreement. This provides a more robust indicator of how much the raters actually agree on the criteria being measured.
The Cohen's Kappa Formula
The formula for calculating Kappa is:
- po: Relative observed agreement among raters.
- pe: Hypothetical probability of chance agreement.
Interpreting Your Results
According to the widely accepted Landis and Koch (1977) scale, Kappa values can be interpreted as follows:
| Kappa Value | Strength of Agreement |
|---|---|
| < 0.00 | Poor (Less than chance) |
| 0.00 – 0.20 | Slight |
| 0.21 – 0.40 | Fair |
| 0.41 – 0.60 | Moderate |
| 0.61 – 0.80 | Substantial |
| 0.81 – 1.00 | Almost Perfect |
Calculation Example
Imagine two doctors diagnosing 100 X-rays as "Normal" or "Abnormal".
- Both agree "Normal": 70 cases
- Both agree "Abnormal": 15 cases
- Dr. A says Normal, Dr. B says Abnormal: 10 cases
- Dr. A says Abnormal, Dr. B says Normal: 5 cases
In this scenario, the observed agreement is 85%. However, after calculating the expected chance agreement, the Cohen's Kappa would provide the true level of diagnostic consistency between the two physicians.