Inter-Rater Reliability Calculator .irr-calculator-container { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif; max-width: 800px; margin: 20px auto; padding: 30px; background-color: #f9fafb; border: 1px solid #e5e7eb; border-radius: 12px; box-shadow: 0 4px 6px rgba(0, 0, 0, 0.05); } .irr-calc-title { text-align: center; color: #1f2937; margin-bottom: 25px; font-size: 24px; font-weight: 700; } .irr-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 20px; margin-bottom: 25px; } .irr-input-group { display: flex; flex-direction: column; } .irr-input-group label { font-size: 14px; font-weight: 600; color: #4b5563; margin-bottom: 8px; } .irr-input-group input { padding: 12px; border: 1px solid #d1d5db; border-radius: 6px; font-size: 16px; transition: border-color 0.2s; } .irr-input-group input:focus { border-color: #3b82f6; outline: none; box-shadow: 0 0 0 3px rgba(59, 130, 246, 0.1); } .irr-matrix-header { grid-column: 1 / -1; background: #e0e7ff; padding: 10px; border-radius: 6px; text-align: center; font-weight: bold; color: #3730a3; margin-bottom: 10px; } .irr-btn { width: 100%; padding: 14px; background-color: #2563eb; color: white; border: none; border-radius: 6px; font-size: 16px; font-weight: 600; cursor: pointer; transition: background-color 0.2s; } .irr-btn:hover { background-color: #1d4ed8; } .irr-result-box { margin-top: 25px; padding: 20px; background-color: #ffffff; border: 1px solid #e5e7eb; border-radius: 8px; display: none; } .irr-result-row { display: flex; justify-content: space-between; padding: 8px 0; border-bottom: 1px solid #f3f4f6; } .irr-result-row:last-child { border-bottom: none; } .irr-final-score { text-align: center; font-size: 32px; font-weight: 800; color: #2563eb; margin: 15px 0; } .irr-interpretation { text-align: center; font-size: 18px; font-weight: 500; color: #059669; padding: 10px; background: #ecfdf5; border-radius: 6px; } .article-content { max-width: 800px; margin: 40px auto; font-family: "Segoe UI", Roboto, Helvetica, Arial, sans-serif; line-height: 1.7; color: #374151; } .article-content h2 { color: #111827; margin-top: 30px; border-bottom: 2px solid #e5e7eb; padding-bottom: 10px; } .article-content h3 { color: #4b5563; margin-top: 20px; } .article-content code { background: #f3f4f6; padding: 2px 6px; border-radius: 4px; font-family: monospace; color: #dc2626; } .article-content table { width: 100%; border-collapse: collapse; margin: 20px 0; } .article-content th, .article-content td { border: 1px solid #d1d5db; padding: 10px; text-align: left; } .article-content th { background-color: #f9fafb; } /* Responsive adjustments */ @media (max-width: 600px) { .irr-grid { grid-template-columns: 1fr; } }

Cohen's Kappa Calculator (2×2)

Enter the number of items (counts) for each agreement/disagreement scenario below.

Agreement Counts (Confusion Matrix)

Both Raters said "YES" (a)

Rater A "YES" / Rater B "NO" (b)

Rater A "NO" / Rater B "YES" (c)

Both Raters said "NO" (d)

Analysis Result

0.00

Interpretation

Total Observations: 0

Observed Agreement (Po): 0%

Expected Agreement (Pe): 0%

How to Calculate Inter-Rater Reliability in Excel (and Online)

Inter-Rater Reliability (IRR) is a crucial statistical measure used to assess the degree of agreement between different judges, coders, or raters. While a simple percentage agreement is easy to calculate, it is often misleading because it does not account for the possibility of raters agreeing by random chance.

The standard metric for IRR for categorical data (Yes/No, Pass/Fail) is Cohen's Kappa (κ). This guide explains how to calculate it using the calculator above and details the manual steps to perform the same analysis in Excel.

What is Cohen's Kappa?

Cohen's Kappa is a robust statistic that measures inter-rater reliability for qualitative (categorical) items. It generally ranges from -1 to +1, where:

0 indicates agreement equivalent to random chance.
1 indicates perfect agreement.
Negative values indicate disagreement worse than random chance.

Step-by-Step: How to Calculate Inter-Rater Reliability in Excel

If you prefer using spreadsheets, follow this process to build your own Cohen's Kappa calculator in Excel. Assume you have two columns of data: Column A contains Rater 1's scores, and Column B contains Rater 2's scores (using "1" for Yes and "0" for No).

1. Create a Contingency Table (Confusion Matrix)

You need to count the four possible scenarios. Set up a 2×2 table in your Excel sheet (e.g., cells D2:E3) representing:

	Rater 2: Yes (1)	Rater 2: No (0)
Rater 1: Yes (1)	(a) Both Yes	(b) Rater 1 Yes, Rater 2 No
Rater 1: No (0)	(c) Rater 1 No, Rater 2 Yes	(d) Both No

2. Use COUNTIFS Formulas

Use Excel formulas to fill these cells based on your raw data in columns A and B:

Cell (a): =COUNTIFS(A:A, 1, B:B, 1)
Cell (b): =COUNTIFS(A:A, 1, B:B, 0)
Cell (c): =COUNTIFS(A:A, 0, B:B, 1)
Cell (d): =COUNTIFS(A:A, 0, B:B, 0)

3. Calculate Observed Agreement (Po)

First, calculate the total number of observations (N): =SUM(a,b,c,d).

Then, calculate the proportion of times the raters actually agreed:

Po = (a + d) / N

4. Calculate Expected Agreement (Pe)

This is the probability that they would agree by random chance. You must calculate the marginal probabilities for "Yes" and "No".

Prob Rater 1 says Yes: P1_Yes = (a + b) / N
Prob Rater 2 says Yes: P2_Yes = (a + c) / N
Chance Agreement Yes: Chance_Yes = P1_Yes * P2_Yes

Repeat for "No":

Prob Rater 1 says No: P1_No = (c + d) / N
Prob Rater 2 says No: P2_No = (b + d) / N
Chance Agreement No: Chance_No = P1_No * P2_No

Total Expected Agreement (Pe): = Chance_Yes + Chance_No

5. Calculate Kappa

Finally, apply Cohen's Kappa formula in a new cell:

= (Po - Pe) / (1 - Pe)

Interpreting Your Results

Once you have your Kappa score from the calculator above or your Excel sheet, use this standard scale (Landis & Koch, 1977) to interpret the level of agreement:

0.00 – 0.20: Slight Agreement
0.21 – 0.40: Fair Agreement
0.41 – 0.60: Moderate Agreement
0.61 – 0.80: Substantial Agreement
0.81 – 1.00: Almost Perfect Agreement

Why not just use Percentage Agreement?

Percentage agreement is simply (a + d) / Total. While intuitive, it inflates the reliability score. For example, if 90% of the items are clearly "No", two raters guessing randomly would still agree on "No" most of the time. Cohen's Kappa corrects for this statistical probability, providing a much more rigorous validation of your data coding process.

How to Calculate Inter Rater Reliability in Excel