It’s that time of year again. Birds are chirping, flowers are blooming, and daily temperatures are wildly fluctuating. Spring is in the air, which can only mean one thing – it’s time to do a little spring-cleaning on your data!

Yes, your data and databases may need a little springtime spruce up. Data spring-cleaning can be like my wife wanting me to go through our closet to tidy up, cull through items to keep, donate things we no longer need, and get rid of old clothing that she feels has to go (No, I will not part with my 20-year-old “painting khakis” – they are comfortable, sensible, and stylish when I paint rooms once every two years).

Much like our closets, garages, attics and offices, data needs regular attention to eliminate clutter and waste that costs both time and money. You wouldn’t have fifteen closets in your home with the same clothing, 75 pairs of the same pants, and only one family member who can tell you where your pink shirt is (no, the other one). That’s because that would be expensive, wasteful, and make life a little challenging, so don’t do that with your data. Let’s get
to cleaning!

Where to Begin Spring-Cleaning Your Data

Start your data clean-up by taking inventory of what is potentially in scope. Like you would pick an area in your home to start spring-cleaning, clearly define a data population to get started. Thinking in terms of eDiscovery data, this might include:

Create a detailed list of your databases by looking at:

  • Total volumes: Records, images, natives, hosting
  • Users: Does everyone with access still need it? When did users last log in?
  • Duplication of records across databases: How were key decisions, like privilege calls handled across all of those copies? Make friends with a database administrator because they can do wonders on this analysis.

Database fields
If you have 1,000 fields in a standard eDiscovery database, you are doing something special and not necessarily in a good way. A lack of field management can lead to slower turnaround time on requests (“Which of the five responsiveness fields do I use?”), incorrect search results, bad production outputs, and slow database performance. Using your database inventory, gather field lists across each database. Clean up and organize what you intend to keep, and identify what is no longer needed. During this field analysis, note consistent fields to be used for a standard database structure going forward.

Field values
Just because you know which of the “Bob Smith” values to search in the custodian field (“Bob Smith”; “Smith, Bob”; “Robert Smith”; “Smith, Bobby”), doesn’t mean that everyone else does. Having variation of database field values can be just as confusing as not knowing which of the similarly named fields to search. It is one of the leading reasons for incorrect search results, and it is why having fully indexed fields is so important. In CVLynx terms, a simple LOOK report or search results TALLY can help you to spot and clean up inconsistencies with your field values.

Your scope of data spring-cleaning might also include physical media, file shares, FTP sites, tracking logs, emails, and all those wonderful spreadsheets and reports that you’ve intended to consolidate. Regardless of your “data closet,” there are great tools and methodologies to help you take the next step.

Helpful Tools and Methodologies

Once you’ve identified your scope, look at potential tools and methodologies to help you get started cleaning. In our homes, we use cleaning supplies, household tools, a box for donations, and a trash bag for items that need to be tossed out. There is not a Goodwill store for donating unwanted data, and most organizations frown upon the trash bag approach to data disposition (thank goodness!). We have you covered with four options to get started with your clean-up on the following pages.


Working for a Lean Six Sigma (LSS) company and having access to a LSS Master Black Belt (Geoff McPherson) has its advantages, including learning great methodologies. The DMAIC is one of those. It stands for Define, Measure, Analyze, Improve, and Control. It is a great way to structure your data clean-up project, especially for larger initiatives that may take longer to complete. An example approach using the DMAIC:

  • Define: Define your problem or goal. For example: “We now have 1,000 fields in our XYX database, despite having had 250 when the database was created.” Or, if you are my wife: “Dave has one pair of painting khakis, which is one pair too many.”
  • Measure: Measure and gather stats to get an idea of areas to take a deeper dive. An example measurement is: 1,000 fields minus 250 equals 750 fields added since database creation that may need to be analyzed.
  • Analyze: Identify and analyze the WHYs that resulted in 750 new fields added, such as: fields needed for review; those added as the result of various tools and processes; one-off requests for tagging; and fields added but never used. The goal is not to get rid of everything, but to understand what you have and why.
  • Improve: You start to see the fruits of your labor as you work through the Improve step. In the field clean-up example, you would remove and consolidate fields, resulting in a manageable database field list. Then, align definitions so that users understand the use of each field, as well as how the data is populated and maintained.
  • Control: During the control step, put in place policies, standard operating procedures, and database controls to manage field additions and changes going forward. This prevents a constant data clean-up loop as new data and users are added.

2- The 5 S’s

Another great LSS methodology to help with your spring-cleaning is the 5S’s, which stands for: Sort, Set in Order, Shine, Standardize, and Sustain. Use the 5S process to sort out what you don’t need, set in order what you do need, clean it all up for ease of access and put some controls around it for stronger defensibility and maintained efficiency. Similar to the DMAIC, the 5S approach provides a structure to your process which leads to lower error rates, as well as increased productivity and data efficiencies.

3 – Other Tools

Sometimes it takes the right tool or solution to fast track your data clean-up and get it properly organized. At TCDI we help our clients do just that with various litigation and corporate data using our CVLynx software. In fact, we did the same for our own data. A few years ago we had about a dozen sources of internal data related to clients, projects, documents, and various metrics that were redundant and difficult to manage.

We utilized the DMAIC and 5S processes and pulled everything together in our own TCDI CVLynx site, so that it could be better organized and easily searchable. Our Development team has recently taken this further by creating our CVUnity solution, which gives us greater insight and control across our valuable data points and systems. Find the tools or solutions that work for you and use them to organize and sustain your clean-up process.

4 – Phone-A-Friend

Remember those database administrators I mentioned earlier? Seek assistance from the appropriate technical resources, project managers, processing team members, end users, and other subject matter experts as you do your data spring-cleaning.

Data spring-cleaning may require many different resources, depending on your scope and timeline. It can take a village to do two things: raise kids and clean-up data. You’ll probably only get buy-in from colleagues to help with one of those, but you never know.

Don’t Forget the Last “S” or “C”!

Now that you’ve scoped your clean-up work and executed using your tools and methodologies, there’s one final thing to remember before you can enjoy your newly cleaned databases, fields, values, and other data populations.

Regardless of the approach that is used, do not forget the “C” in DMAIC or the last “S” of the 5S’s – Control and Sustain. If you put proper controls, documentation, workflows, and processes in place, you won’t end up in the same place six months down the road. In our 1,000 field example, this could mean putting guidelines or restrictions around how fields are added to a database and who can add them. Training, audits, and documentation can help establish control and sustain all of your hard work when it comes to keeping your data clean.

Let’s face it, very few people get excited about spring-cleaning their home, let alone their data. It becomes necessary over time, but it doesn’t have to be an overwhelming challenge. If you find yourself needing help to get started, remember that there are people, tools, and resources ready to help. Just as long as it doesn’t involve getting rid of a pair of 20-year-old khakis with paint splatters.