Language is messy. Ask any person who has ever had to learn a second language and they will tell you that the most difficult aspect isn’t learning all the rules, but understanding the exceptions to the rules — the real-world application of the language.Continue reading
When it comes to protecting classified data, blackout redaction has been in use for at least a century. While it is not the only acceptable form of data sanitization, it is historically the oldest and most commonly utilized by eDiscovery firms. This is despite the fact there are more modern and easy-to-use alternatives that save time and reduce errors. The two main data sanitization alternatives that meet legal requirements include anonymization and pseudonymization.Continue reading
Machine Translation users care about quality and performance. Based on our own observations and the feedback we’ve received; the quality of our Neural MT is impressive. Evaluating performance is a stickier subject, but we’d like to dig our hands in and present our innovations and achievements and how it benefits NMT users.
By performance we mostly mean the manner in which a system performs in terms of speed and efficiency in varying production environments. It is important to note that performance and quality in Neural MT are tightly connected: it is easy to accelerate a given model compromising on the quality. Therefore, when evaluating performance improvement, we always check that quality remains very close to optimal quality.
Since switching to NMT at the end of 2016, we’ve invested our R&D efforts into optimizing our engines to be more efficient, while maintaining and even improving translation accuracy. Our latest, 2nd generation NMT engines, available in our latest release of SYSTRAN Pure Neural® Server, implements several technical optimizations that make the translation faster and more efficient.
New model architecture
The first generation of neural translation engines was based on recurrent neural networks (RNN). This architecture requires the source text to be encoded sequentially, word by word, before generating the translation.Continue reading
Since 2016, there has been a sharp increase in open source machine translation projects based on neural networks or Neural Machine Translation (NMT) led by companies such as Google, Facebook and SYSTRAN. Why have machine translation and NMT-related innovations become the new Holy Grail for tech companies? And does the future of these companies rely on machine translation?
Never before has a technological field undergone so much disruption in such a short time. Invented in the 1960s, machine translation was first based on grammatical and syntactical rules until 2007. Statistical modelling (known as statistical translation or SMT), which matured particularly due to the abundance of data, then took over. Although statistical translation was introduced by IBM in the 1990s, it took 15 years for the technology to reach mass adoption. Neural Machine Translation on the other hand, only took two years to be widely adopted by the industry after being introduced by academia in 2014, showing the acceleration of innovation in this field. Machine translation is currently experiencing a golden age of technology.
From Big Data to Good Data
Not only have these successive waves of technology differed in their pace of development and adoption, but their key strengths or “core values” have also changed. In rule-based translation, value was brought by code and accumulated linguistic resources. For statistical models, the amount of data was paramount. The more data you had, the better the quality of your translation and your evaluation via the BLEU score (Bilingual Evaluation Understudy, the most widely used algorithm measuring machine translation quality). Now, the move to Machine translation based on neural networks and Deep Learning is well underway and has brought about major changes. The engines are trained to learn language as a child does, progressing step by step. The challenge is not only to process exponential data (Big Data) but more importantly to feed the engines the most qualitative data possible. Hence the interest in “Good data.”
As of January 3rd 2018, companies in the financial industry operating in Europe are required by law to fully comply with the new MiFID II regulation. A good portion of the new rules requires translating various documents for a multilingual audience.
1 – Effortlessly translate detailed information on tons of transactions
2 – Easily provide investors with multilingual research reports and articles
3 – Produce E-Learning and other company material to educate employees across the EU on complying with these new regulations
4 – Translate contracts and other official investment documents
You can find the whole infographic here.
[This article originally appeared on Kirti Vashee’s Blog]
This is the final post for the 2017 year, a guest post by Jean Senellart who has been a serious MT practitioner for around 40 years, with deep expertise in all the technology paradigms that have been used to do machine translation. SYSTRAN has recently been running tests building MT systems with different datasets and parameters to evaluate how data and parameter variation affect MT output quality. As Jean said:
” We are continuously feeding data to a collection of models with different parameters – and at each iteration, we change the parameters. We have systems that are being evaluated in this setup for about 2 months and we see that they continue to learn.”
This is more of a vision statement about the future evolution of this (MT) technology, where they continue to learn and improve, rather than a direct reporting of experimental results, and I think is a fitting way to end the year in this blog.
It is very clear to most of us that deep learning based approaches are the way forward for continued MT technology evolution. However, skill with this technology will come with experimentation and understanding of data quality and control parameters. Babies learn by exploration and experimentation, and maybe we need to approach our continued learning, in the same way, learning from purposeful play. Is this not the way that intelligence evolves? Many experts say that AI is going to be driving learning and evolution in business practices in almost every sphere of business.
SYSTRAN’s solution are used every day by various types of companies across many industries to get the most accurate and secure automatic translations on any type of content – from sensitive documents to websites to mobile apps and much more. We’d like to focus today on how one of our clients – Alvarez & Marsal, a consultancy firm- uses SYSTRAN’s platform to manage eDiscovery projects with the highest efficiency and accuracy.
The processes and tools used in eDiscovery scenarios are, most of the time, quite complex given the large volumes of electronic data produced. Unlike hard-copy evidence, e-documents are a lot more dynamic and contain various metadata that demand the highest translation quality in order to eliminate any claims of spoliation at any time in a litigation case.
Phil Beckett, the firm’s Managing Director, who has recently been named ‘Investigation Digital Forensic Expert of the year’ by Who’s Who Legal is talking to us about how SYSTRAN’s solutions plug into their internal processes to manage their projects end to end.
Phil Beckett – Managing Director at Alvarez & Marsal
When a global enterprise gets sued, it’s vital to know who is involved and how. But finding out who to blame isn’t always simple.
Global law firms are tasked with sifting through thousands, sometimes millions of emails, chats, and legal documentation during eDiscovery. These documents and audio recordings could be in many different languages and stored around the world. Sometimes that data is stored in countries with strong data protection regulations, such as Brazil and parts of the EU, so it cannot under any circumstances leave the country.
So, how can an office in the U.S. review hundreds of days of correspondence in multiple languages?
If the firm hires translators, they’ll need dozens with a strong knowledge of everything from slang to deep subject matter expertise of the topic in discovery. If instead they decide to go with an e-discovery translation solution, they’ll still need help during the review process, especially for data in Asian languages – there are several ways to interpret one word, for which there may be five slang alternatives. In either case, the team must spend a lot of time and money to get reliable and accurate results.
Until now, that is.
Don’t let language be a hurdle in your business
Lean Manufacturing involves constant efforts to eliminate or reduce ‘muda’ (Japanese term defining waste or any activity that consumes resources without adding value) in design, manufacturing, distribution and customer service processes.
As an operational system, Lean Manufacturing maximizes added value, reduces essential support and eliminates waste in all processes throughout the value chain. Waste in this regard may include over-production, inventory tasks, waiting time, correction, transportation and over-processing.
In summary, the equation for Lean Management is: Increased profitability equals increasing prices or reducing costs. A big part of the cost is the turnaround time between an order being placed and when it is shipped.
What is your Manufacturing Lead Time from the placement of an order to shipping? What would you expect as the standard for your business? What do you foresee as a likely evolution for both you and your customers?
The world has become a global village: electronic commerce is truly international and must be standardized
The 2nd Logistics Information Standardization Forum held in Seoul, South Korea on September 2016, brought together a host of actors in an effort to standardize international logistics information as a key factor in improving logistic processes in business. The forum put forth the creation of an international consensus for a cooperation system on international logistics information.
At the AFNET association, we promote and develop such standards so as to improve relationships between organizations and enterprises. Standardizing electronic commerce for logistics means defining a common document structure for order processing transactions and product deliveries.