When it comes to protecting classified data, blackout redaction has been in use for at least a century. While it is not the only acceptable form of data sanitization, it is historically the oldest and most commonly utilized by eDiscovery firms. This is despite the fact there are more modern and easy-to-use alternatives that save time and reduce errors. The two main data sanitization alternatives that meet legal requirements include anonymization and pseudonymization.
In the past, the organizations most concerned with data sanitization including government agencies and law firms. Over the past few years, privacy protection has taken a global stage. The European Union’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) now put legal parameters in place for data privacy protection. These laws affect not just how companies collect and use data, but also how they store and share that data.
Could your eDiscovery operation benefit from incorporating more blackout redaction alternatives into your day-to-day?
The Use of Blackout Redaction
Governments have long relied on blackout redaction to censor government-imposed secrets. Attorneys also use redaction to preserve attorney-client privilege while cooperating with the opposition council. When choosing a redaction method to remain compliant with data privacy laws, it’s no wonder then that so many companies turn to this method.
If it works for attorneys and government agencies then it must be effective, right? This isn’t always true. The reality is that long term use of a tool doesn’t necessarily make it the best for the job. Let’s use the following example to show the differences in practice between redaction, anonymization, and pseudonymization.
Common blackout practice is to simple redact entire blocks of text, completely obliterating any chance of understanding what is trying to be communicated. Terrific for privacy, terrible for context.
Tedious Process
One of the first problems users discover with blackout redaction is that it takes a long time to accomplish. It requires reading through the data line by line and removing all sensitive data manually. This task has to be completed by someone who has a good understanding of the specific cases at hand and data protection laws. That way, they know exactly what they need to pass over with a black marker. It can take days and weeks for someone to do this and it leaves plenty of room for human error.
Removes Context
When organizations rely on blackout redaction, they tend to use it liberally. Instead of removing only the sensitive information in a sentence, they may wipe out half the sentence. If a paragraph is affected in several areas, they may mark the entire paragraph in black. When this much data is removed, it certainly protects privacy. However, it also leaves the authorized receiving party with more questions than answers. This is useful in instances where the sender deliberately wants to pass on as little context as possible to protect all parties involved. However, anonymization can do this just as well — if not better.
Electronic Data Risks
Most people trust blackout redaction because of its use by government agencies. However, even they are not immune to easily avoidable data breaches and leaks. One such instance followed the release of a U.S. military report regarding the death of an Italian secret agent. Readers were able to view the classified information by simply copying and pasting the text under the blackouts into a Word Processor.
Note that it takes more steps to properly redact data through blackout redaction compared to automated anonymization:
- Identify the content that must be redacted.
- Use redaction annotations to permanently replace the information.
- Clean up additional data, such as links, metadata, bookmarks and anything else that might unintentionally pass on confidential information.
The Use of Anonymization and Pseudonymization
Many people now rely on blackout software for redaction. However, if there is a system error, missing just one of the steps outlined above can lead to ineffective redaction. Both anonymization and pseudonymization offer faster and simpler solutions. Because these tools rely only on automation, they are also less prone to human error.
Anonymization
Anonymization involves the replacement of sensitive words and numbers with masking symbols. When you enter your password to log into a device and the characters render as asterisks, that is anonymization at work in its simplest form. For use in data sanitization of an entire document, automatic anonymization tools make it easy to redact specific data.
It also preserves the general context. This ensures that you pass on usable information, but without compromising your clients. See the anonymized example below:
“At the time of the internal audit, there were no discrepancies discovered. On January, 10th, 2017 **** ***** spoke with **** *** at ********** * about the importance of ******** *.”
To do this, it relies on features, such as “pattern search and redact.” This allows for easy identification and redaction of sensitive data that usually have patterns. Data that fall into this category include Social Security numbers and credit card information. Using automation to accomplish this saves firms time and money as you don’t have to divert workers from core functions of the business to handle this.
When you handle documents written in foreign languages, in most cases, machine translation does the job for the anonymization of some personal data. The ability to set up custom dictionaries in anonymizers allows for even more precise and accurate anonymization.
Pseudonymization
Pseudonymization is another way of making personal data private by removing the pieces that link it to a specific individual. It follows the same premise as anonymization in that it changes the characters. However, instead of symbols, it changes sensitive data to fictitious data to improve readability. This allows you to keep a lot more context because you’ve replaced “Word A” with “Word B.” See the example below:
“At the time of the internal audit, there were no discrepancies discovered. On January, 10th, 2017 Anne Jane spoke with Mark Brown at Division X about the importance of Directive Y.”
The main difference here is that it does leave room for possible re-identification, whereas anonymized data cannot be re-identified. This is not always a bad thing. There are instances where an organization may later want to re-make those connections once the data is once again in the right hands.
The ability for re-identification of pseudonymous data, therefore, makes pseudonymization a useful alternative in some circumstances. Firms should only rely on full anonymization in cases where personal data must not, under any circumstances, be linked to a data subject.
How To Choose the Best One
Anonymization allows for a firm to submit personal information when faced by litigation without breaking data privacy laws. It does this by the masking of identifying data that would allow this personal information to be linked to an individual. An example of this is when a firm is required to submit all emails housed on servers in Europe. Even though EU laws prohibit the transfer of personal information, firms can circumvent this through anonymization.
If you’re redacting evidence intended to be provided to opposing counsel, then, in theory, blackout redaction could actually be more beneficial. This is because it provides the opposing counsel with even less context to work with as it builds its case. However, if you desire to automate the entire process using just one tool, anonymization can achieve the same purpose.
When it comes to the internal sharing of documents, preserving some context is important. This ensures that other teammates still understand the context of the data. That’s where the challenge comes in for firms that rely on blackout redaction. Why would you want to make your teammates’ jobs more difficult by losing as much context as possible? When teammates need to re-identify the subjects of the data, later on, pseudonymization offers a much better solution.
Why Choose SYSTRAN for Anonymization
SYSTRAN provides the best of both worlds for organizations that want to transfer data internally and externally. Select between powerful anonymization and pseudonymization tools in the matter of just a couple of clicks. This puts full control in your hands when it comes to storing, processing and sharing data while maintaining data privacy. It also helps to protect your firm from running afoul of data privacy laws.
Our eDiscovery anonymization was the brainchild of Reed Smith LLP, an international law firm. The software does more than merely change around characters or turn names into “John Doe.” It takes a meticulous approach to anonymization by analyzing the full data in documents, including the metadata and visual text.