The exact count of words in the "ground truth" text.
function calculateWordErrorRate() {
// Get input values. Using || 0 for S, D, I to handle empty inputs as zero errors.
var s = parseInt(document.getElementById('wer_substitutions').value) || 0;
var d = parseInt(document.getElementById('wer_deletions').value) || 0;
var i = parseInt(document.getElementById('wer_insertions').value) || 0;
var n = parseInt(document.getElementById('wer_total_reference').value);
var resultDiv = document.getElementById('wer_result_output');
resultDiv.style.display = "block";
// Input Validation
if (s < 0 || d < 0 || i < 0) {
resultDiv.innerHTML = 'Error counts cannot be negative.';
return;
}
// Validate N: It must be a number and greater than 0 to avoid division by zero.
if (isNaN(n) || n <= 0) {
resultDiv.innerHTML = 'Please enter a valid total number of words in the reference transcript (N must be greater than 0).';
return;
}
// Core WER Calculation Logic: (S + D + I) / N
var totalErrors = s + d + i;
var werDecimal = totalErrors / n;
// Convert to percentage and round to 2 decimal places
var werPercentage = (werDecimal * 100).toFixed(2);
// Determine interpretation
var interpretation = "";
if (werPercentage < 5) {
interpretation = 'Excellent';
} else if (werPercentage < 15) {
interpretation = 'Good';
} else if (werPercentage < 25) {
interpretation = 'Acceptable';
} else {
interpretation = 'Needs Improvement';
}
// Display Result
resultDiv.innerHTML = '
Calculation Results
' +
'Total Errors Found (S+D+I): ' + totalErrors + '' +
'Word Error Rate: ' + werPercentage + '%' +
'General Interpretation: ' + interpretation + " +
'Note: WER can exceed 100% if insertion errors are very high.';
}
Understanding Word Error Rate (WER)
Word Error Rate (WER) is the standard metric used to evaluate the performance of Automatic Speech Recognition (ASR) systems and sometimes machine translation. It quantifies how accurate a generated transcript is compared to a "ground truth" or perfect reference transcript.
Unlike accuracy, where a higher percentage is better, WER is an error metric; therefore, a lower WER percentage indicates better performance.
The WER Formula
WER is calculated using the Levenshtein distance at the word level. The formula compares the recognized sequence against the reference sequence to count three types of errors:
Substitutions (S): When the system recognizes the wrong word (e.g., "cat" instead of "hat").
Deletions (D): When the system omits a word that was in the audio.
Insertions (I): When the system adds a word that was not in the audio.
The sum of these errors is divided by the total number of words in the original reference transcript (N).
WER = (Substitutions + Deletions + Insertions) / Total Reference Words
Example Calculation
Imagine the original audio said: "The quick brown fox jumps over the lazy dog." (N = 9 words).
An ASR system outputs: "The quick red fox jumps ___ the lazy frog cat."
We analyze the errors:
Substitutions (S): 2 ("red" for "brown", "frog" for "dog").
Deletions (D): 1 (missed the word "over").
Insertions (I): 1 (added the word "cat").
Total Errors = 2 + 1 + 1 = 4.
WER = 4 / 9 = 0.4444… which is a 44.44% WER.
Interpreting WER Scores
What constitutes a "good" WER depends heavily on the complexity of the audio data (background noise, accents, spontaneity of speech). However, general benchmarks often used in the industry are:
< 5%: Excellent (Near human parity for clear speech).
5% – 15%: Good (Usable for most applications without heavy editing).
15% – 25%: Acceptable (Requires noticeable human post-editing).
> 25%: Needs Improvement (Often indicates challenging audio or a poorly performing model).
It is important to note that because of insertions, it is mathematically possible for the WER to exceed 100% if the recognizer hallucinates many extra words.