Data Reduction and Filtering

The review discovery workflows were already time-consuming and pain-staking enough as they were, but now that we’ve brought Big Data into the mix, it’s become a cost-prohibitive battle for many companies.

However, if you can take a more strategic approach to your data before beginning your review, you can have a big impact on the complexity of your ESI and eDiscovery expenses. This is where data reduction and filtering techniques come into play.

Data Reduction is a Requirement, Not an Option

With the mind-blowing volumes of data that most eDiscovery professionals have to deal with today, data filtering strategies are no longer optional. They play a pivotal role in helping you accelerate your review process.

Most of the data reduction and filtering work will happen in the data processing stage and failing to properly carry out this task can put your case results at risk.

You need to build a searchable index before you can begin the whole process. Extract metadata and text from your data and enter them into a database. The collection must be text-searchable so you can filter it effectively.

Effective Data Reduction Techniques

Data Filtering by File Types

This is one of the simplest, yet most effective techniques to identify and remove irrelevant files. Most modern eDiscovery platforms allow you to quickly discover what file types appear in your data set.

This can allow you to quickly pinpoint file types that may not be useful at all.  A lot of container files, like MBOX, Outlook PST and ZIP, are useless by themselves and only their contents have some relevance. You can also use this technique to remove large, irrelevant media files that take up a lot of space and save yourself a lot of money.

Data Filtering by Email Addresses

Many experts say that search terms are often ‘over-inclusive’ when used over Big Data. They can include results from irrelevant spam emails, newsletters, or internal correspondence that have absolutely nothing to do with the case.

By using data reduction with email addresses, you can quickly eliminate any irrelevant senders and make your life much easier.


The process of deduplication allows us to identify and remove any files that seem to be copies of other files. You can remove both exact copies and near matches using deduplication. The software analyses all the files and removes those that contain a certain percentage of duplicate information from other files.

Email Threading

Email threading tools allow you to group a thread of related emails and view them as a chain. Legal teams can quickly analyse their relevance and delete them together if they are not relevant to the case.


Data reduction should be a core part of your eDiscovery workflow. Firms that do not adopt data filtering techniques are setting themselves up for failure and mountains of eDiscovery costs.

If you’re looking for a reliable managed eDiscovery services provider, we’re here to help.

At GoeDisco, we offer litigation support, secure data hosting servicesDSAR solutions, the latest litigation technologies, eDiscovery tools, and much more. For more information about our services, contact us at +44 (0) 207 157 9686 or request a quote!