The minds behind SYSTRAN sit down for an interview regarding the complexities and the capacities of specialized neural machine translation engines.
Participants: Peter Zoldan, Senior Data Engineer -Software Engineer Linguistic Program, Svetlana Zyrianova, Linguistic Program, Petra Bayrami, Jr. Software Engineer – Linguistic Program, Natalia Segal, R&D Engineer.
How much data is required to create a specialized engine?
The more bilingual data, the better the quality. For broad domains such as news, millions of bilingual sentences will be required. However, if the domain is narrow, such as technical support documents for certain products, then even a small set of sentences of 50,000, noticeably improves the quality.
The amount of data required will depend on how broad or narrow the demand you are specializing the engine into.Continue reading