Our webinar “Get More From SPNS9” on May 15th, 2020 was a huge success. The webinar demonstrated 6 new exciting upgrades to the SYSTRAN Pure Neural Server 9.6’s, further scaling its technological capabilities. Thank you to those who joined us.
In this post, we have compiled the highlights from the presentation and answers to the questions we receive after.
The minds behind SYSTRAN sit down for an interview regarding the complexities and the capacities of specialized neural machine translation engines.
Participants: Peter Zoldan, Senior Data Engineer -Software Engineer Linguistic Program, Svetlana Zyrianova, Linguistic Program, Petra Bayrami, Jr. Software Engineer – Linguistic Program, Natalia Segal, R&D Engineer.
How much data is required to create a specialized engine?
The more bilingual data, the better the quality. For broad domains such as news, millions of bilingual sentences will be required. However, if the domain is narrow, such as technical support documents for certain products, then even a small set of sentences of 50,000, noticeably improves the quality.
The amount of data required will depend on how broad or narrow the demand you are specializing the engine into.
Language is messy. Ask any person who has ever had to learn a second language and they will tell you that the most difficult aspect isn’t learning all the rules, but understanding the exceptions to the rules — the real-world application of the language.
In July 2019 we launched SYSTRAN Translate, our free online translator. We were rencently thrilled when it reached the symbolic milestone of 1 million users.
You probably think that there are already lots of translation websites out there, so what does make this one different?
1. The Human Factor
SYSTRAN Translate is powered by Neural Machine Translation. In addition, we are also bringing in human expertise with a community of language experts all around the world to train translation models and make them even better. That’s right, our neural translation models are in the hands of experts.
e-Discovery can be a long, daunting process even in the best of times. In today’s globalized world of data, however, you not only have to worry about the sheer amount of information but also what language the content is in. This is where Neural Machine Translation comes in to break that language barrier. As fast as NMT is, though, odds are you have dreamed about how to make your systems even more efficient. How do you ensure any job can get completed on even the most ambitious of timelines?
When it comes to protecting classified data, blackout redaction has been in use for at least a century. While it is not the only acceptable form of data sanitization, it is historically the oldest and most commonly utilized by eDiscovery firms. This is despite the fact there are more modern and easy-to-use alternatives that save time and reduce errors. The two main data sanitization alternatives that meet legal requirements include anonymization and pseudonymization.
As noted by Anju Khurana, Head of Privacy of the Americas, Bank of New York Mellon, “There are now over 100+ privacy laws in the world and GDPR is driving other countries to adopt similar regulations.” (corpcounsel.com, Oct. 2019). The California Consumer Protection Act (“CCPA”) which comes into effect on January 1, 2020, is the latest, and very likely not the last. Most data privacy experts anticipate additional states enacting data privacy regulations and think it likely that Congress will eventually do so at the federal level.
SYSTRAN has been wholeheartedly involved in open source development over the past few years via the OpenNMT initiative,whose goal is to build a ready-to-use, fully inclusive, industry and research ready development framework for Neural Machine Translation (NMT). OpenNMT guarantees state-of-the-art systems to be integrated into SYSTRAN products and motivates us to continuously innovate.
In 2017, we published OpenNMT-tf, an open source toolkit for neural machine translation. This project is integrated into SYSTRAN’s model training architecture and plays a key role in the production of the 2nd generation of NMT engines.
Since the publication of the Executive Order on Maintaining American Leadership in Artificial Intelligence by the White House this past February, many government agencies are struggling with getting started in AI. They realize use of this technology will help them be more efficient. However, finding those tasks that will be “quick wins” in moving towards AI adoption is the main challenge.
Last month, we conducted a webinar “So, You Think Your Game Is Localized?”, the first of a 3-part-series given by Elizabeth Senouci from XTM International, and Victor Ramirez from SYSTRAN.
If you couldn’t guess by the title, “So, You Think Your Game Is Localized?” was a webinar focused on Video Game Localization. Senouci and Ramirez are both experts on the topic and thus decided to share their knowledge with the video games community.
In the webinar, Senouci and Ramirez discussed the need for game localization, some basic terminologies associated with it, user interfaces, global marketing, and the importance of customer service.
“Localization isn’t just one thing you can do and just get done with it. It’s a holistic process and it’s actually customized based on your game, your product,” Elizabeth said in her intro.