A new version update of SPNS was recently released.
What are the most important benefits of updating to 9.4.1? Perhaps the single most important and memorable enhancement relates to how it handles PDF files.
SYSTRAN Pure Neural Server 9.4.1 enhances the File translation feature for PDF files, with performance and quality improvements of the OCR component, and also with a new mode allowing text extraction without OCR. It also brings various other fixes and improvements.
SPNS 9.4.1 improves the File Translation feature with enhancements of the OCR component, a new flow to optimize performance and a new text-only mode to translate compatible PDF files.
The filter now uses IRIS IDRS SDK 15.4.6 for PDF Translations, with a new embedded component to load PDFs, instead of converting them into image files first. In the past, OCR required first rendering the pages of a PDF into a sequence of PNG image files, which would take disk space and extra time. With the new version, the PDF file is directly scanned by the OCR and this increases processing performance. On average, PDF Translation flow is 40% faster than in the previous version, with also quality improvements.
IRIS IDRS 15.4.6 now also brings the support of the Vietnamese language. PDF in Vietnamese can now be processed by the translation server.
In some scenarios, it is important to retain the structure and layout seen in a PDF page, so we use OCR to try to reconstruct the same or similar formatting. In some other scenarios however, you might not need this. You might instead only be interested in the text, plain and simple. And you will want it to be faster than via OCR.
If you are in this scenario, you can now do a simple extraction instead of OCR based text conversion. This of course only works if the PDF is not a scanned image, and also doesn’t have other factors blocking the access to the text you see, such as encryption or password protection.
The scenario may be typical in eDiscovery/eAnalytics, when you are possibly facing the task of translating thousands of PDF files, but don’t really care to retain the formatting structure of them. You only want the sentences, e.g. to search for keywords and do your analytical work
This feature is in ‘beta’ mode as of 9.4.1 and can be suited for use case where speed of translation is more important and information retrieval is more important than formatting style of the translated file.
When uploading a file in the File Translation menu, users have now the choice to select between the ‘ocr’ (default mode) and the ‘fast pdf’ mode to translate the PDF file.
Here are just a few more features that were added in 9.4.1 :
- The user interface is now available in Chinese.
- Performance improvements on translations using GPU (up to 2x faster on V100) with the new Common image.
- For the Feedback management, a filter on “Language Pair” has been added to quickly retrieve all feedback for a given LP
- In the Statistics view, it is now possible to differentiate if the translation request used the cache or generated a new translation, with the column “Segment cache hits”. It shows the number of segments that were retrieved from the cache. Reminder: Caching with Redis was introduced a few prior releases ago. It can significantly speed up translations, but it can also be disabled if that is preferred, for examples for reasons of confidentiality. When caching is in use and you are translating highly repetitive content, such as legal disclaimers repeated over and over in a long email chain, you could experience translation speeds into the 10k-20k char/sec and more, even on a single-instance, simple CPU installation.