.irr-calculator-container {
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif;
max-width: 800px;
margin: 20px auto;
padding: 30px;
background-color: #f9fafb;
border: 1px solid #e5e7eb;
border-radius: 12px;
box-shadow: 0 4px 6px rgba(0, 0, 0, 0.05);
}
.irr-calc-title {
text-align: center;
color: #1f2937;
margin-bottom: 25px;
font-size: 24px;
font-weight: 700;
}
.irr-grid {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 20px;
margin-bottom: 25px;
}
.irr-input-group {
display: flex;
flex-direction: column;
}
.irr-input-group label {
font-size: 14px;
font-weight: 600;
color: #4b5563;
margin-bottom: 8px;
}
.irr-input-group input {
padding: 12px;
border: 1px solid #d1d5db;
border-radius: 6px;
font-size: 16px;
transition: border-color 0.2s;
}
.irr-input-group input:focus {
border-color: #3b82f6;
outline: none;
box-shadow: 0 0 0 3px rgba(59, 130, 246, 0.1);
}
.irr-matrix-header {
grid-column: 1 / -1;
background: #e0e7ff;
padding: 10px;
border-radius: 6px;
text-align: center;
font-weight: bold;
color: #3730a3;
margin-bottom: 10px;
}
.irr-btn {
width: 100%;
padding: 14px;
background-color: #2563eb;
color: white;
border: none;
border-radius: 6px;
font-size: 16px;
font-weight: 600;
cursor: pointer;
transition: background-color 0.2s;
}
.irr-btn:hover {
background-color: #1d4ed8;
}
.irr-result-box {
margin-top: 25px;
padding: 20px;
background-color: #ffffff;
border: 1px solid #e5e7eb;
border-radius: 8px;
display: none;
}
.irr-result-row {
display: flex;
justify-content: space-between;
padding: 8px 0;
border-bottom: 1px solid #f3f4f6;
}
.irr-result-row:last-child {
border-bottom: none;
}
.irr-final-score {
text-align: center;
font-size: 32px;
font-weight: 800;
color: #2563eb;
margin: 15px 0;
}
.irr-interpretation {
text-align: center;
font-size: 18px;
font-weight: 500;
color: #059669;
padding: 10px;
background: #ecfdf5;
border-radius: 6px;
}
.article-content {
max-width: 800px;
margin: 40px auto;
font-family: "Segoe UI", Roboto, Helvetica, Arial, sans-serif;
line-height: 1.7;
color: #374151;
}
.article-content h2 {
color: #111827;
margin-top: 30px;
border-bottom: 2px solid #e5e7eb;
padding-bottom: 10px;
}
.article-content h3 {
color: #4b5563;
margin-top: 20px;
}
.article-content code {
background: #f3f4f6;
padding: 2px 6px;
border-radius: 4px;
font-family: monospace;
color: #dc2626;
}
.article-content table {
width: 100%;
border-collapse: collapse;
margin: 20px 0;
}
.article-content th, .article-content td {
border: 1px solid #d1d5db;
padding: 10px;
text-align: left;
}
.article-content th {
background-color: #f9fafb;
}
/* Responsive adjustments */
@media (max-width: 600px) {
.irr-grid {
grid-template-columns: 1fr;
}
}
function calculateKappa() {
// 1. Get DOM elements
var in_yy = document.getElementById("agree_yy");
var in_yn = document.getElementById("disagree_yn");
var in_ny = document.getElementById("disagree_ny");
var in_nn = document.getElementById("agree_nn");
var out_box = document.getElementById("irrResult");
var out_kappa = document.getElementById("kappaScore");
var out_interp = document.getElementById("kappaInterp");
var out_total = document.getElementById("totalObs");
var out_po = document.getElementById("poVal");
var out_pe = document.getElementById("peVal");
// 2. Parse values (Default to 0 if empty)
var a = parseFloat(in_yy.value) || 0;
var b = parseFloat(in_yn.value) || 0;
var c = parseFloat(in_ny.value) || 0;
var d = parseFloat(in_nn.value) || 0;
// 3. Calculate Totals
var total = a + b + c + d;
if (total === 0) {
alert("Please enter at least one observation.");
return;
}
// 4. Calculate Observed Agreement (Po)
var observedAgreement = (a + d) / total;
// 5. Calculate Expected Agreement (Pe)
// Marginal probabilities
var raterA_Yes = (a + b) / total;
var raterA_No = (c + d) / total;
var raterB_Yes = (a + c) / total;
var raterB_No = (b + d) / total;
// Chance of random agreement
var chanceYes = raterA_Yes * raterB_Yes;
var chanceNo = raterA_No * raterB_No;
var expectedAgreement = chanceYes + chanceNo;
// 6. Calculate Kappa (k)
var kappa = 0;
// Handle perfect agreement edge case to avoid div by zero if Pe=1
if (expectedAgreement === 1) {
kappa = (observedAgreement === 1) ? 1 : 0;
} else {
kappa = (observedAgreement – expectedAgreement) / (1 – expectedAgreement);
}
// 7. Determine Interpretation
var interpretation = "";
var interpColor = "";
var bgColor = "";
if (kappa < 0) {
interpretation = "Poor Agreement (Less than chance)";
interpColor = "#b91c1c"; // Red
bgColor = "#fef2f2";
} else if (kappa <= 0.20) {
interpretation = "Slight Agreement";
interpColor = "#9a3412"; // Orange-Red
bgColor = "#fff7ed";
} else if (kappa <= 0.40) {
interpretation = "Fair Agreement";
interpColor = "#b45309"; // Orange
bgColor = "#fffbeb";
} else if (kappa <= 0.60) {
interpretation = "Moderate Agreement";
interpColor = "#0d9488"; // Teal
bgColor = "#f0fdfa";
} else if (kappa <= 0.80) {
interpretation = "Substantial Agreement";
interpColor = "#059669"; // Green
bgColor = "#ecfdf5";
} else {
interpretation = "Almost Perfect Agreement";
interpColor = "#15803d"; // Dark Green
bgColor = "#f0fdf4";
}
// 8. Update UI
out_box.style.display = "block";
out_kappa.innerText = kappa.toFixed(3);
out_interp.innerText = interpretation;
out_interp.style.color = interpColor;
out_interp.style.backgroundColor = bgColor;
out_total.innerText = total;
out_po.innerText = (observedAgreement * 100).toFixed(2) + "%";
out_pe.innerText = (expectedAgreement * 100).toFixed(2) + "%";
}
How to Calculate Inter-Rater Reliability in Excel (and Online)
Inter-Rater Reliability (IRR) is a crucial statistical measure used to assess the degree of agreement between different judges, coders, or raters. While a simple percentage agreement is easy to calculate, it is often misleading because it does not account for the possibility of raters agreeing by random chance.
The standard metric for IRR for categorical data (Yes/No, Pass/Fail) is Cohen's Kappa (κ). This guide explains how to calculate it using the calculator above and details the manual steps to perform the same analysis in Excel.
What is Cohen's Kappa?
Cohen's Kappa is a robust statistic that measures inter-rater reliability for qualitative (categorical) items. It generally ranges from -1 to +1, where:
- 0 indicates agreement equivalent to random chance.
- 1 indicates perfect agreement.
- Negative values indicate disagreement worse than random chance.
Step-by-Step: How to Calculate Inter-Rater Reliability in Excel
If you prefer using spreadsheets, follow this process to build your own Cohen's Kappa calculator in Excel. Assume you have two columns of data: Column A contains Rater 1's scores, and Column B contains Rater 2's scores (using "1" for Yes and "0" for No).
1. Create a Contingency Table (Confusion Matrix)
You need to count the four possible scenarios. Set up a 2×2 table in your Excel sheet (e.g., cells D2:E3) representing:
|
Rater 2: Yes (1) |
Rater 2: No (0) |
| Rater 1: Yes (1) |
(a) Both Yes |
(b) Rater 1 Yes, Rater 2 No |
| Rater 1: No (0) |
(c) Rater 1 No, Rater 2 Yes |
(d) Both No |
2. Use COUNTIFS Formulas
Use Excel formulas to fill these cells based on your raw data in columns A and B:
- Cell (a):
=COUNTIFS(A:A, 1, B:B, 1)
- Cell (b):
=COUNTIFS(A:A, 1, B:B, 0)
- Cell (c):
=COUNTIFS(A:A, 0, B:B, 1)
- Cell (d):
=COUNTIFS(A:A, 0, B:B, 0)
3. Calculate Observed Agreement (Po)
First, calculate the total number of observations (N): =SUM(a,b,c,d).
Then, calculate the proportion of times the raters actually agreed:
Po = (a + d) / N
4. Calculate Expected Agreement (Pe)
This is the probability that they would agree by random chance. You must calculate the marginal probabilities for "Yes" and "No".
- Prob Rater 1 says Yes:
P1_Yes = (a + b) / N
- Prob Rater 2 says Yes:
P2_Yes = (a + c) / N
- Chance Agreement Yes:
Chance_Yes = P1_Yes * P2_Yes
Repeat for "No":
- Prob Rater 1 says No:
P1_No = (c + d) / N
- Prob Rater 2 says No:
P2_No = (b + d) / N
- Chance Agreement No:
Chance_No = P1_No * P2_No
Total Expected Agreement (Pe): = Chance_Yes + Chance_No
5. Calculate Kappa
Finally, apply Cohen's Kappa formula in a new cell:
= (Po - Pe) / (1 - Pe)
Interpreting Your Results
Once you have your Kappa score from the calculator above or your Excel sheet, use this standard scale (Landis & Koch, 1977) to interpret the level of agreement:
- 0.00 – 0.20: Slight Agreement
- 0.21 – 0.40: Fair Agreement
- 0.41 – 0.60: Moderate Agreement
- 0.61 – 0.80: Substantial Agreement
- 0.81 – 1.00: Almost Perfect Agreement
Why not just use Percentage Agreement?
Percentage agreement is simply (a + d) / Total. While intuitive, it inflates the reliability score. For example, if 90% of the items are clearly "No", two raters guessing randomly would still agree on "No" most of the time. Cohen's Kappa corrects for this statistical probability, providing a much more rigorous validation of your data coding process.