NMT Scaling: 4 Ways to Create a Translation Powerhouse

e-Discovery can be a long, daunting process even in the best of times. In today’s globalized world of data, however, you not only have to worry about the sheer amount of information but also what language the content is in. This is where Neural Machine Translation comes in to break that language barrier. As fast as NMT is, though, odds are you have dreamed about how to make your systems even more efficient. How do you ensure any job can get completed on even the most ambitious of timelines?

Scaling is the answer. It’s easy to say, and luckily with a little know-how, it can be equally easy to execute. When using NMT software like SYSTRAN’s, there are several options you can take to improve translation speeds. We’ll layout and explain the options for you here so you can be equipped to decide which method works best for your organization.

NMT Speeds Without Scaling

First off, why would you want to scale your NMT operations? Depending on the language and hardware you are using, NMT can typically translate 2,000 characters per second if you are using eight cores of your processor on the task (eight cores being the recommended minimum for acceptable translation speeds). That means you can translate Dicken’s A Tale of Two Cities in a little over one minute — not bad for most translation jobs. However, what happens when you have multiple users all trying to translate documents at the same time, sharing the same computer resources? And what if your timeline requires a quick turnaround that your current rate of translation cannot support?

This is why scaling can be so beneficial for your organization. Scaling solutions can be as simple as allocating processors more efficiently or utilizing caching. You can also utilize more complex, hardware-based options such as installing more cores, servers or switching to GPUs.

Distributing Processors Efficiently

We can all agree that computers are smart. However, we have all heard or experienced frustrating stories of computers behaving very unintelligently. After all, most computers are only programmed to do exactly what you tell them to do without deviation.

This is often the case with resource allocation. Remember, translation engines typically require eight cores to operate effectively. If you are translating multiple languages at the same time with a 24-core processor, you may tell your computer to allocate eight cores to translating Mandarin Chinese, another eight cores to translating German, and another eight for Spanish.

On the surface, this makes sense. You have the recommended number of cores each working on a different language so the three languages can be translated simultaneously. But, what happens if you only have one or two Spanish documents and over 200 Mandarin Chinese emails? The eight processors dedicated to working on Spanish will finish their job quickly and then sit idly by while the eight Mandarin Chinese processors slowly chug their way through their mountain of data.

Sounds inefficient and illogical, right? It’s not the computer’s fault — it was told to dedicate eight processors to each language, so that’s what it did. Many systems may not intelligently balance workloads across all available processors. No matter how much or how little data needs to be processed, the computer will always use the same eight cores to translate the Mandarin Chinese, even as the other cores sit unused. 

Dynamically allocating your processors can solve that problem. You can tell your computer to utilize idle processors so, if your current workload is all one language, it will use all 24 processors to work through the documents instead of the original eight. Once a second language is added, it can adjust to dedicate 12 processors to each one and so on. Depending on the workload, you could double or even triple translation speeds to meet your needs.

Luckily, reallocating processors efficiently is easy to accomplish — any IT expert should be able to set up your system to improve translation speeds dynamically based on the workload.

If you don’t have an IT expert, you aren’t out of luck. SYSTRAN can manage the resource allocation for you as well. We can provide solutions both hosted by us or behind your own network. There are various different deployment strategies that we can tailor to suit your organization’s needs.


Another easy way to improve speeds is by caching data. When you look at a large number of documents, you may encounter mountains of repetitive text. For example, emails often start and end with the same cordial phrases each time. A company may put the same legal disclaimer on all of their documents. Documents from one person may use the same words and phrases common to their vocabulary. 

Caching stores translation results so that, when the same phrase comes across, the software does not have to go through all the work of translating it a second time because it “remembers” the translation from earlier.

Caching can often make translation go up to ten times faster, you just need to have the memory and storage available to have a large enough cache. With SYSTRAN, how long you want to cache data and how much storage you want to use can all be customized to fit your resources and needs.

Adding More Resources

Once your system is set up to run efficiently, you may still want to scale your translation abilities to boost speeds. At this point, it’s time to look into adding more cores or servers.

The basic idea is simple, the more cores you have allocated to a task, the faster the operation will go. There are some differences between adding more cores to an existing processor as opposed to utilizing a new server entirely. 

Multiple servers not only tend to operate faster than one server with a large number of cores, but they also provide a level of redundancy. Tests that compare a 64 core processor to two servers each with 32 cores usually find that the two server option runs faster due to overhead management processes.

There is another reason you may want multiple servers as opposed to one — resiliency. A one-server system creates a single point of failure. If you have to take the server down for maintenance or any other reason, translation stops entirely. This is not an issue if you have multiple servers. Translation may slow down if you have to take a server down, but you still have a secondary, so the work will not come to a halt.

The downside to having more servers is complexity. Adding more hardware increases the size of your network, the servers take up space and require more IT management and attention as opposed to simply adding cores to your current system.

Using GPUs

One final way you can significantly boost your NMT speeds is through using Graphics Processing Units (GPUs) as opposed to traditional CPUs. GPUs can often run five to six times faster than their counterparts. That’s over 10,000 characters per second. To put that in context, you could translate the entire Lord of the Rings trilogy, all three books, in less than one minute.

Yes, GPUs are powerful and fast, but they are also voracious in power and memory consumption. Additionally, they will often cost more than typical CPUs, so you must carefully consider what resources you have available if you are considering a switch to a GPU system for NMT.


Technology is always evolving, improving, getting faster. Data is also evolving at a similar rate. e-Discovery will only get bigger as time moves forward — requiring organizations to process and translate more data faster. Stagnation is anathema to any business’ success — as technology and data evolve, companies must adapt. These scaling techniques will help you stay ahead of the game, conducting NMT on the scale that you need without limits. 

Alan, Machine Translation Expert (US Market)
5 Min Read
Newsletter Sign-Up
Find all the news and the latest technologies. A magazine designed by SYSTRAN