Evaluating Machine Translation : how to mesure success

Global organizations need reliable communication between employees, vendors, sales teams, and customers. Growing businesses want to be able to expand into new territories. International helpdesks need to provide resources to consumers around the world.

Machine translation (MT) provides language translation with unimaginable speed and accuracy. Its ability to learn from data means its functionality will continue to improve with use, and we see its value and relevancy across industries. So it’s no surprise that the global machine translation market gathered USD 812.6 Million in 2021 and is set to garner a market size of USD 4,069.5 Million by 2030.

Machine translation helps businesses save time and money, minimizes the risks related to language barriers, and provides improved customer experiences.

Accuracy in communication is critical, especially if you are re dealing with legal, medical, or technical content. But how effective is your MT system? Is its output sufficient for its intended purpose? By evaluating the quality of your translation through recognized metrics, you can ensure you are reaching recipient expectations. In addition, it can help you provide feedback for developers to refine systems and offer better MT overall.

Effective MT Across Industries

Machine translation is becoming the norm in every industry, from gaming to government agencies; from medicine to manufacturing, and every field in between – organizations must communicate across language barriers.

“So, You Think Your Game Is Localized?”

In e-commerce, 76% of buyers prefer purchasing products they can read about in their native language. That means higher sales for businesses translating website product descriptions accurately.

Medical device and pharmaceutical companies often rely on accurate translation for complex documentation, where misunderstandings could mean the difference between life and death.

Government agencies often need to provide documents and reference material in the country’s official language and languages that meet the needs of immigrants and non-native speakers.

Manufacturing and technology companies rely on up-to-date MT to keep up with changing technology and industry jargon.

Metrics for evaluation help organizations ensure the quality of their translation meets the needs of businesses and their audiences and flag difficulties that may need human intervention.

Which Metrics Should Businesses Use to Evaluate MT Systems?

Regardless of the industry, evaluating the effectiveness of MT output is essential. Businesses can evaluate their MT systems using human assessment or AI-powered tools.

Manual human evaluation by professional translators is the gold standard for high-quality evaluation of MT output, capturing the nuances of language use and meaning. However, assessments can vary between evaluators, and since the process is time-consuming, human evaluation may not be the most cost-effective option for every business.

Automated evaluation metrics provide numerical scores reflecting the quality of the MT output as compared to a human-generated translation. Businesses can use them to evaluate machine translation quickly, and they can handle large amounts of data. Several accepted metrics are available. Each uses different calculations to measure the accuracy of the MT and gives different weights to various types of errors.

Three commonly used metrics are:

Bilingual Evaluation Understudy (BLEU)

The BLEU metric is the most widely accepted and correlates most closely to human translation. This metric analyzes a text, compares the MT output to the reference, and assigns a score from 0-1 or 0-100 based on the comparison of n-grams and precision. A higher score represents a higher correlation to the reference text. BLEU’s strengths are its speed and ease of use.

Translation Edit Rate (TER)

Also called the Translation Error Rate, the TER metric assigns a score based on the number of edits that would be required to match the MT output exactly to the reference text. This helps determine how much post-editing would need to be performed after MT was completed. A lower score means less post-editing will be necessary.

Metric for Evaluation of Translation with Explicit Ordering (METEOR)

This metric calculates unigram precision and recall. It uses features like stemming, synonymy matching, and exact word matching to analyze text at the sentence level. The resulting score indicates how well the MT output correlates to the human-translated reference.

Other metrics are available, including NIST, LEPOR, COMET, and PRIS. While automatic metrics are imperfect and may not reach human standards, they are powerful, cost-effective tools. In addition, they are fast, straightforward to use, and don’t require human intervention.

How Can Businesses Use Metrics to Improve the Effectiveness of their MT systems?

Each method of MT evaluation comes with strengths and drawbacks. Businesses will choose the metric or combination that best meets their goals and budget. However, consistent monitoring and evaluation are essential to keep up with evolving language, industry-specific terminology, and MT capabilities.

Organizations can then use these results to move their businesses forward in a variety of ways, such as:

Identifying areas for improvement

If the MT system consistently makes a specific grammatical error or struggles with a particular language, the business can prioritize these improvements with improved training models.

Goal-setting based on metrics

Some audiences may require more attention to detail than others. Therefore, a business may wish to define a target metric to meet within a specific time frame.

Comparing multiple MT systems

Companies evaluating different systems or working with multiple vendors can make informed purchasing decisions based on their industries and use cases.

By setting realistic goals, analyzing the evaluation metrics, and tracking progress toward goals over time, businesses can ensure their MT system meets their specific needs and provides the high-quality translations they require. When they consider these metrics along with user feedback and preferences, they can ensure they are meeting their business goals and customer expectations.

Metrics for Evaluating Machine Translation: How to Measure Success