SYSTRAN has been committed to developing and delivering state-of-the-art translation services for over 50 years. In 2016, SYSTRAN partnered with Harvard NLP to create OpenNMT — the world’s leading open-source neural machine translation framework. Providing both a PyTorch-based and TensorFlow-based execution, OpenNMT consistently ranks first across metrics on the WNGT 2020 Efficiency Shared Task.
Unlike many other players in the neural machine translation space, SYSTRAN both maintains OpenNMT and provides B2B secure, proven translation solutions for companies using its custom-built OpenNMT-based platform. Currently, OpenNMT has over 500 publications, 3,000 GitHub stars, and several major awards, making it an incredibly popular and powerful framework in the NMT industry.
This incredibly powerful and dynamic OpenNMT core allows SYSTRAN to deliver unparalleled value and best-of-breed base model quality. Users can work in a variety of environments, and SYSTRAN provides the API, interfaces, plug-ins, and tools necessary to facilitate dynamic and conductive language communications. By layering its solution upon an open-source core, SYSTRAN allows for nearly unlimited customizability and flexibility to creators and end-users, earning them a strong market position and a healthy connection to businesses and LSPs — paving the way for SYSTRAN Model Studio’s game-changing business model.
How the Model Studio Works
SYSTRAN partnered with OVH, a global cloud provider, to provide a state-of-the-art responsible solution while eliminating wasteful compute cycles. All training begins from pre-built SYSTRAN models, either generic or domain-specific. There is no need to build a translation model from scratch. Rather, you are incrementally enhancing existing models on the platform which have already been built and perfected by language experts.
SYSTRAN has done much of the upfront work to get you started, but even optimizing generic and pre-existing models is much easier thanks to SYSTRAN’s game-changing features.
Data Preparation
Upload your bilingual or monolingual in-domain corpus (Spanish-English for example) into the system’s data repository to prepare the model for training. The data will remain completely secure during the training process and will not be used for purposes outside your own model training. SYSTRAN’s proprietary technologies are used to clean and prepare the data for neural model training.
Model Training
Building a translation model from scratch is an arduous task. SYSTRAN Model Studio allows you to select from within SYSTRAN’s large translation model catalog for the starting point model that you will enhance with your own domain-specific data to specialize for your own translation needs.
By specializing an already trained SYSTRAN model, you will benefit from SYSTRAN’s proprietary technologies, such as embedded UD Sampling, Augmentation, Filtering, Noising and Tokenization.
Evaluation and Publication
Evaluate your specialized model’s evolution at each training iteration with SYSTRAN Model Studio’s scoring module. Within SYSTRAN Model Studio, it is easy to compare the BLEU score evolution of your models on more than 50 gold test sets curated by SYSTRAN’s data scientists and categorized by domains. You can also add your own test set to check the model’s progress on your very specific domain.