10 Ways to Improve Your Translation Output

As Globalization 4.0 rears its head and the convergence of Industry 4.0 and remote work become commonplace in the business ecosystem, translation is an increasingly important component of productivity, engagement, and communication.

But how do you iron out the knots? You need to effectively communicate with team members, colleagues, and customers across physical and linguistic borders. Unfortunately, there’s a tiny road bump in the road— language.

Translation engines allow you to seamlessly communicate across language barriers. But creating a well-oiled, hyper-engaging translation solution isn’t always easy. Obviously, the source of your engine is important. Modern Neural Machine Translation (NMT) uses intelligent neural networks to instantly contextualize, digest, and output translations in micro-seconds.

Compared to the Rule-based and Statistical-based engines of the past, NMT allows you to create complex and highly effective translation infrastructure without multiple massive, highly researched bilingual dictionaries that take years to develop.

But beyond the source, there are still plenty of caveats that can impede the speed and accuracy of your translations. Here are 10 ways to instantly improve your translation output for faster, smarter, and easier communication.

1. Eliminate Ambiguity

While neural network-based machine translation is incredibly gifted at understanding context and meaning, it still isn’t perfect. A great example of this is ambiguity. When ambiguous words (i.e., words with multiple meanings or interpretations) are used frequently, the NMT system is forced to make a rapid-fire decision about the correct usage of the word across several translation candidates.

Often, this isn’t a problem with core language. But in niche industries and sectors that use words in unique ways or bring their own lexicon of ambiguous words, it can make translation tricky. Despite the state-of-the-art neural architecture of solutions like Open NMT, lexically ambiguous sentences are still a barrier to accurate, constructive, and cohesive translations.

To be clear, it’s not that NMT isn’t capable of handling ambiguity or that neural networks are getting more efficient at tackling this problem; they are. In fact, Open NMT recently ranked fastest across multiple tasks in the WNGT 2020 Efficiency Shared Task — which incorporates BLUE scores. It’s that ambiguity naturally impacts your BLUE score, especially if it’s commonly used and segmented to niche industries that feed minimal data into the neural network.

Eliminating ambiguity from your source can immediately improve speed and accuracy. So, we always recommend keeping ambiguous words to a minimum, unless they’re absolutely required for cohesive translation.

2. Remove Unnecessary Words

The more words you have in your source, the more your neural network must process, translate, and learn over time. By removing “filler” words, you can quickly increase speed and accuracy. But that’s only part of the story. Unnecessary words, regardless of their ease-of-translation, add additional complexities to the user-end.

Again, word density isn’t an inherent shortcoming of NMT. The opposite is true. NMT is far better at translating massive word banks than past solutions like phrase-based machine translation (PBMT) and rule-based machine translation (RBMT) engines. Still, minimizing unnecessary words does increase the speed of your translations. But, given the speed and accuracy of modern NMT-based translation engines, that’s a secondary concern (we’re talking about shaving off tenths-of-seconds). Unnecessary words create a strain on the end-user, and it makes effective communication unnecessarily complicated.

[Video in French] How a machine can learn to translate into your language?

3. Use the Definite Article

This ties back to ambiguity. Certain nouns (e.g., skip, bank, etc.) can also double as verbs. Adding a definite article (i.e., “the”) before appropriate nouns can significantly reduce the load on your NMT solution. There have been plenty of hypotheses and research surrounding machine translations’ inability to hyper-effectively (over time) translate ambiguous lexicons. In fact, Google Translate, which runs a relatively advanced NMT engine, has significant issues with ambiguous terms.

While solutions like Open NMT (i.e., the core of SYSTRAN) do well with ambiguity, it’s not perfect. Adding definite articles can significantly increase accuracy and translation throughput. Simply using “the” immediately provides context to your NMT engine.

4. Active Voice

The active voice keeps sentences clear, concise, and short. In traditional communication, active voice is “almost” always the ideal way to communicate. In most cases, the passive voice adds additional words to a sentence (e.g., was, going to, etc.) and draws out the length and complexity of sentence organization.

Without getting too deep into aspectual verbs, their imperfective/perfective nature in other languages, and their impact on encoding, verb tense is simply important for clarity. Meaning can get lost in temporal (i.e., tense) translation. Keeping your structure active can improve translation quality for your NMT engine and increase translation digestibility for users.

5. Avoid Anaphora

Neural machine translation is effective at understanding extra-sentential information and extra-sentential dependencies (i.e., “context that exists outside the translated sentence”). However, engines still struggle (especially over a large body of work) to gender anaphoric references. This is often due to language requirements. So, let’s imagine that you have a large employee handbook. In the handbook, OSHA requirements are discussed. The first sentence uses the term “OSHA,” while subsequent sentences use the anaphoric reference “it.” In other words, “it” now refers to OSHA — which was described earlier and requires context.

For the average human reader, this is immediately comprehensible. However, when your translation engine digests this information, it may have difficulty understanding how to gender “it” from English to another language, like French. Over time, this can cause issues. Let’s look at an example of how complex anaphora can get for your translation engine.

The man walked his dog; it was a Chihuahua.
The Chihuahua at a dog biscuit; it was delicious.
The biscuit crumbs fell on a newspaper; it was the New York Times.

For the human reader, these sentences, back-to-back, are easy to comprehend. Your translation engine will have to figure out how to gender each “it” in the context of the sentence, which can quickly become complicated — as noted above.

6. Stay Away from Unnecessary Context-based Language

Idioms, colloquialisms, abbreviations (especially ones that align with natural language words), and metaphors are all context-based. Words deviate from their traditional meaning. Again, removing these context-intensive structures helps both the engine and your users. Idioms and colloquialisms are notoriously lost in translation, and abbreviations and metaphors can slow down engine performance.

7. Make it Simple, Short, and Sweet

While deeper networks, multiple layers, and unique scripts can all make translating larger, more complex sentences a breeze, but avoiding them in the first place is typically ideal. We understand that certain industries and niches (think legal) are forced to operate with longer sentence structures. That’s fine! But if you can avoid long sentences, do it. The simpler the structure, the more accurate your engine will be over time. More importantly, it makes translation more digestible for the end-user.

Occasionally, we see companies spend time and energy maximizing NMT effectiveness via layers and deep neural networks to translate massive sentences with high BLEU scores. It’s possible. But doing so may impact the person the translation is meant for in the first place. NMT tech is incredible. And creating hyper-accurate NMT networks is awesome. However, we would recommend avoiding these long sentences for the users’ sake. Additionally, shorter sentences and simpler structures prevent you from having to build out more resource-intensive engines. It saves you money, time, and headaches, and it makes your content easier to comprehend for the end-user — which is the end goal of translation.

8. Control Terminology Consistency

This one is easy. Control the consistency of your terminology. Generally, terminology issues happen across three layers:

Inconsistency: This happens when two terms are used for the same concept.
Ambiguity: This happens when two or more terms are used for more than a single concept.
Errors: This happens when the wrong terms are used for a concept.

We’ve already discussed ambiguity, and errors are straightforward. Inconsistency is the tricky one. A great example used by the Justice Department in their terminology inconsistency brief is the word “perimortem.” When doctors use the word perimortem, they’re almost always referring to the body. When forensic anthropologists use the word perimortem, they’re almost always referring to bone.

There are plenty of examples of inconsistent terminology out there, but (most often) we see inconsistent terminology come into play in the context of brand language. Certain products, niche industries, and marketing materials may use terms in a different way than they’re used traditionally. Try to avoid these cases.

9. Specialize Your NMT Engines

Like humans, every NMT engine is a unique butterfly. In the past, rule-based engines required months of language creation, years of study, and a plethora of experience to execute. Today, NMT cuts that time to days. The neural networking capabilities of NMT allows it to feed off past material while learning (nearly instantly) from new material and usage.

But it still requires deep-dive customization, especially if you want a hyper-specialized engine for niche industries.

Luckily, NMT makes this incredibly easy. For example, you can go in and quickly add glossaries and terminologies to your SYSTRAN engine. Combined with its already-powerful neural capabilities, this makes creating specialized engines rapid, accurate, and meaningful.

In other words, translation engines are an ongoing project, not a one-and-done solution. You may need names, acronyms, or subjects that don’t get translated for security reasons. Or you may need to go in and change terminologies as your industry grows and evolves. Again, NMT makes this easy, but it can be tempting to think of these engines as one-time time investments. They’re not. They’re an ever-evolving solution that requires a little tender loving care.

10. Add and Maintain Dictionaries

With NMT solutions like SYSTRAN, you can add and maintain your own dictionaries at a sophisticated level. For example, SYSTRAN can identify the correct inflected forms of language to utilize, regardless of their original input in the dictionary. This makes it easy to quickly go in and add language without forcing you to create a dictionary entry for every sub-variation of that language that will get used during translation.

So, here’s where things get cool. SYSTRAN’s NMT learns from previous dictionaries, quickly learns grammar in the context of your organization, can be customized within days, and then you can constantly add new content to your dictionary. So, there are multiple layers happening, which allows you to fine-tune your translation output at the granular level. In turn, this makes it incredibly easy to perform robust, end-to-end translation on an ongoing basis.

Your needs will change. The language of your industry will change. You shouldn’t have to invest in a new engine every time that happens. SYSTRAN allows you to quickly add and maintain dictionaries that add additional flavor and context to your translations.