Enter the number of observations for two raters classifying items into two categories (e.g., Yes/No, Positive/Negative).
Results
What is Inter-Rater Reliability?
Inter-rater reliability (IRR) is a statistical measure that determines the degree of agreement among independent raters. It gives a score of how much homogeneity, or consensus, there is in the ratings given by judges. It is useful in refining the tools given to human judges, for example, by determining if a particular scale is appropriate for measuring a particular variable.
Understanding Cohen's Kappa
While calculating simple percent agreement is easy, it can be misleading because it does not account for the agreement that could occur purely by chance. Cohen's Kappa ($κ$) is a robust statistic that corrects for chance agreement.
The formula for Cohen's Kappa is:
κ = (P_o – P_e) / (1 – P_e)
- P_o (Observed Agreement): The proportion of times the raters agreed (Sum of diagonal / Total).
- P_e (Expected Agreement): The proportion of agreement expected by chance based on the marginal totals.
Interpreting the Results
Cohen's Kappa ranges from -1 to +1, where 0 represents the amount of agreement that can be expected from random chance, and 1 represents perfect agreement between the raters. The interpretation table commonly used (Landis & Koch, 1977) is:
- < 0: Poor agreement (Less than chance)
- 0.01 – 0.20: Slight agreement
- 0.21 – 0.40: Fair agreement
- 0.41 – 0.60: Moderate agreement
- 0.61 – 0.80: Substantial agreement
- 0.81 – 1.00: Almost perfect agreement
Example Scenario
Imagine two radiologists are examining 100 X-rays to detect a fracture.
- They both agree there is a fracture in 40 cases (A).
- Radiologist 1 sees a fracture, but Radiologist 2 does not in 10 cases (B).
- Radiologist 1 does not see a fracture, but Radiologist 2 does in 5 cases (C).
- They both agree there is no fracture in 45 cases (D).
Using the calculator above, you would enter 40, 10, 5, and 45 into the respective fields. The calculator computes the observed agreement versus the probability they agreed by chance to determine the reliability of their diagnosis protocol.
Why Use This Calculator?
Manually calculating the expected probability ($P_e$) involves finding the marginal sums for both raters and both categories, multiplying them, and dividing by the square of the total. This tool automates the process, ensuring accuracy for research papers, quality assurance testing, and clinical data analysis.