What’s So Special About Domain Specialization?

Language is messy. Ask any person who has ever had to learn a second language and they will tell you that the most difficult aspect isn’t learning all the rules, but understanding the exceptions to the rules — the real-world application of the language. Jargon, idioms, regional differences, complex terminology — it’s what makes language a living part of any subculture for both countries and specific industries. 

This nuanced aspect of language is one of the biggest struggles for most Neural Machine Translation systems. Computers are generally great at understanding rules, logic, and straightforward systems. Learning an imperfect system, however — a human-built system such as language — is when things get more difficult. This is where domain specialization comes in.

What is a Specialized Translation Engine?

You can think of translation engines as categorized into two camps, generalized and specialized. A generalized translation engine will be trained on a wide variety of topics in a certain language. News, sports, medicine, science, technology, a generalized engine is a jack of all trades — it knows enough to get by in the day-to-day world, just like the average person.

However, what happens when you take the average person off the street and ask them to read and understand a Doctoral dissertation on Neuropsychology? Odds are that they will come across a multitude of terms they can barely pronounce, let alone understand within the appropriate context.

What about simple terms that could have multiple meanings, like the word pin? It seems like a pretty straightforward word. A pin is a noun, an object used to hold two pieces of fabric together. 

Or, is it an acronym for Personal Identification Number?

Is it an object that one rolls a bowling ball at? 

A hairpin or a safety pin? 

A verb? The act of pinning someone down?

“Let’s put a pin in it”?

This is one very simple example of how, depending on the context of the situation, words can take on drastically different meanings. A PIN in financial jargon is different from a pin in the fashion world. Specialized Translation Engines are equipped to not only understand the technical terms of the field (or domain) that they are trained in but can also be trained to understand regional and contextual differences. The difference between a financial PIN and a safety pin matters for eDiscovery

This is the benefit that specialized translation engines have over-generalized engines. A high-quality specialized translation engine will receive specialization training in addition to the general training, not in place of it. A general engine can survive the everyday world, a specialized engine can do that and much, much more. 

Depending on the amount of content involved in the training, a translation engine can go from generalized to specialized in a matter of days or weeks through repetitive iterations of practice. SYSTRAN’s software can be specialized for a variety of industries, including IT, legal, medical, biotech, automotive, and many more to fit whatever eDiscovery needs your organization may have. 

Benefits of Specialization

Domain specialization is beneficial in two main ways — it saves your organization both time and money.

According to scientific research, most humans read at a rate of 800 characters per minute. The base translation rate for most Neural Machine Translation engines is 2,000 characters per second, and there are many ways to boost that speed.

While this difference in speed is true for any machine translation — generalized or specialized, the added benefit of specialization is realized when a human is required to review or quality check machine-translated content. A specialized translation machine will produce fewer errors, making the human QC work easier and faster. A generalized machine may confuse the definition for a “pin” and misunderstand the context, causing an erroneous translation. The text message said their pin wasn’t working. Does that mean they have a wardrobe malfunction or are they unable to get money from the ATM? 

If a generalized machine is translating that text message from English into Farsi, how does it know which definition to use for the English pin? The Farsi word for a fastener or the one for a numerical password? A human checker would have to go through all those mistakes and unknowns, correcting them one at a time. A specialized engine, however, will be far more accurate, meaning less human work — and less human work means less human cost.

On top of that, one translation engine can be specialized in multiple domains across multiple languages — doing a task that would otherwise take not just one human expert, but possibly a whole team.

The benefits are clear: the difference between general and specialized machine translation is the difference between man-hours and man-minutes, high cost and high savings.

The SYSTRAN User Dictionary

Language evolves in real-time, even after an engine has been specialized. This is where the SYSTRAN user dictionary comes in, allowing you to update terminology on the spot without having to undergo an entirely new specialization process. 

It takes time to take a generalized engine and teach it specialization. If a specialized translation engine still comes across new words, new spellings (and misspellings) for words, or new ways that humans break the rules of their domain, a human operator can easily add the new word, spelling, or terminology in real-time with the user dictionary. This allows for continual growth and improvement from the engine when a full re-specialization is not warranted. User dictionary customization can even be done with generalized engines to improve performance on a smaller scale than specialization. 

Flexibility is the key to accuracy. Between SYSTRAN’s specialization engines and user dictionary, you can be equipped with the features necessary to be flexible—tailoring the system to your specific eDiscovery needs.

Conclusion

Languages are an accurate reflection of human culture — unique and changing by the day. Understanding the nuances to each domain can be difficult for people, let alone machines. Specialized Translation Engines are the best way for organizations to use eDiscovery effectively and efficiently to save both time and money. 

 


 

Author
Alan, Machine Translation Expert (US Market)
Time
4 Min Read
Newsletter Sign-Up
Find all the news and the latest technologies. A magazine designed by SYSTRAN