Word Error Rate Calculator

.wer-calculator-container { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif; max-width: 600px; margin: 20px auto; padding: 25px; border: 1px solid #e0e0e0; border-radius: 8px; background-color: #f9f9f9; } .wer-calculator-container h3 { text-align: center; margin-bottom: 25px; color: #333; } .wer-form-group { margin-bottom: 15px; } .wer-form-group label { display: block; margin-bottom: 8px; font-weight: 600; color: #555; } .wer-form-group input[type="number"] { width: 100%; padding: 10px; border: 1px solid #ccc; border-radius: 4px; box-sizing: border-box; /* Important for padding to not affect width */ } .wer-calculate-btn { width: 100%; padding: 12px; background-color: #0073aa; color: white; border: none; border-radius: 4px; font-size: 16px; font-weight: bold; cursor: pointer; transition: background-color 0.3s; } .wer-calculate-btn:hover { background-color: #005177; } #wer_result_output { margin-top: 25px; padding: 20px; background-color: #fff; border: 1px solid #ddd; border-radius: 4px; text-align: center; display: none; /* Hidden by default */ } #wer_result_output h4 { margin-top: 0; color: #333; } .wer-error-msg { color: #d63638; font-weight: bold; }

Word Error Rate (WER) Calculator

Words incorrectly recognized.
Words completely omitted by the recognizer.
Words added that were not in the original audio.
The exact count of words in the "ground truth" text.
function calculateWordErrorRate() { // Get input values. Using || 0 for S, D, I to handle empty inputs as zero errors. var s = parseInt(document.getElementById('wer_substitutions').value) || 0; var d = parseInt(document.getElementById('wer_deletions').value) || 0; var i = parseInt(document.getElementById('wer_insertions').value) || 0; var n = parseInt(document.getElementById('wer_total_reference').value); var resultDiv = document.getElementById('wer_result_output'); resultDiv.style.display = "block"; // Input Validation if (s < 0 || d < 0 || i < 0) { resultDiv.innerHTML = 'Error counts cannot be negative.'; return; } // Validate N: It must be a number and greater than 0 to avoid division by zero. if (isNaN(n) || n <= 0) { resultDiv.innerHTML = 'Please enter a valid total number of words in the reference transcript (N must be greater than 0).'; return; } // Core WER Calculation Logic: (S + D + I) / N var totalErrors = s + d + i; var werDecimal = totalErrors / n; // Convert to percentage and round to 2 decimal places var werPercentage = (werDecimal * 100).toFixed(2); // Determine interpretation var interpretation = ""; if (werPercentage < 5) { interpretation = 'Excellent'; } else if (werPercentage < 15) { interpretation = 'Good'; } else if (werPercentage < 25) { interpretation = 'Acceptable'; } else { interpretation = 'Needs Improvement'; } // Display Result resultDiv.innerHTML = '

Calculation Results

' + 'Total Errors Found (S+D+I): ' + totalErrors + '' + 'Word Error Rate: ' + werPercentage + '%' + 'General Interpretation: ' + interpretation + " + 'Note: WER can exceed 100% if insertion errors are very high.'; }

Understanding Word Error Rate (WER)

Word Error Rate (WER) is the standard metric used to evaluate the performance of Automatic Speech Recognition (ASR) systems and sometimes machine translation. It quantifies how accurate a generated transcript is compared to a "ground truth" or perfect reference transcript.

Unlike accuracy, where a higher percentage is better, WER is an error metric; therefore, a lower WER percentage indicates better performance.

The WER Formula

WER is calculated using the Levenshtein distance at the word level. The formula compares the recognized sequence against the reference sequence to count three types of errors:

  • Substitutions (S): When the system recognizes the wrong word (e.g., "cat" instead of "hat").
  • Deletions (D): When the system omits a word that was in the audio.
  • Insertions (I): When the system adds a word that was not in the audio.

The sum of these errors is divided by the total number of words in the original reference transcript (N).

WER = (Substitutions + Deletions + Insertions) / Total Reference Words

Example Calculation

Imagine the original audio said: "The quick brown fox jumps over the lazy dog." (N = 9 words).

An ASR system outputs: "The quick red fox jumps ___ the lazy frog cat."

We analyze the errors:

  • Substitutions (S): 2 ("red" for "brown", "frog" for "dog").
  • Deletions (D): 1 (missed the word "over").
  • Insertions (I): 1 (added the word "cat").

Total Errors = 2 + 1 + 1 = 4.

WER = 4 / 9 = 0.4444… which is a 44.44% WER.

Interpreting WER Scores

What constitutes a "good" WER depends heavily on the complexity of the audio data (background noise, accents, spontaneity of speech). However, general benchmarks often used in the industry are:

  • < 5%: Excellent (Near human parity for clear speech).
  • 5% – 15%: Good (Usable for most applications without heavy editing).
  • 15% – 25%: Acceptable (Requires noticeable human post-editing).
  • > 25%: Needs Improvement (Often indicates challenging audio or a poorly performing model).

It is important to note that because of insertions, it is mathematically possible for the WER to exceed 100% if the recognizer hallucinates many extra words.

Leave a Comment