By Ryan Peterson, J.D., R.C.A., Discovery Engineer
I was having a conversation with a close friend of mine the other day who also happens to be in the E-Discovery industry. We got to talking about how to treat document families for the various stages of processing, review and production. That conversation got me thinking that while most vendors and E-Discovery attorneys treat this as basic knowledge, if you don’t practice in this area, you may not know the correct terminology, or what you should insist upon with your vendors. I decided I would write about the various stages of the E-Discovery lifecycle and how you should treat document families as you move through them.
For this first blog post, I am going to define the basic document family relationships and discuss standard industry treatment. In my next post, I’ll go into specific issues that can arise in the different stages of the E-Discovery life cycle. At the end of the two part blogging, when you are presented with these questions, my hope is that this post will help you to speak intelligently on the subject and make an informed decision about how you should proceed.
Let’s start with the basics; a document family can be created in multiple ways:
1. E-MAIL FILES
Email data is usually transferred in a container format; emails are exported from their native program to a pst, ost, or nsf file. This methodology preserves the metadata associated with the individual email files. Once data is extracted from email archives, an email and any attachment thereto are treated as separate records, but are linked as a document family the BeginAttach and EndAttach field.
Industry standards dictate that this recognized family should not be broken apart without a compelling reason presented by counsel, and even then the entire family should be produced, but specific documents can be place-holdered with a legal reason describing why it was withheld (ie: Privileged, Confidential Business Information, etc).
2. OTHER CONTAINERS FILES
A method of transport for loose documents (ie: non-email data), is to zip up the loose files into a container of some sort. This also preserves the metadata in the same way that an email archive does. If your dataset contains archives that are not email archives files and are not otherwise attached to an email, most processing tools will treat the extracted contents of the archive (ie: the documents that reside in the archive file) as a document family, and those documents will be linked using the same fields noted above. It is important to note that there is a difference of opinion within the industry as to whether or not the family relationships created by the processing tools for non-email archive or container files are legitimate document families.
Some people treat the extracted contents as truly all being interconnected or related in some way. Others, like myself, tend to view the extracted contents of a non-email archive as being nothing more than documents that resided in a folder (albeit, a compressed folder) together. The logic is that you wouldn’t treat all documents in a folder as being related for document family purposes, so why would you do that for a compressed folder?
3. COMPOUND DOCUMENTS
If your dataset contains files such as a PowerPoint presentation that have excel, word, or other PowerPoint files that were added to the source file, most processing tools allow you to extract out the files into their own records. If you select this option, the tool will create a family relationship amongst the extracted and source file using the same linking field identified in the Email Files section. This is often referred to as compound document extraction.
As a general note, the industry has been moving away from this process for a couple of reasons.
First, all of the content that made the parent document responsive or not responsive can be found in the parent document, so extracting the additional files does nothing to add to the decision making process.
Second, there are now tools that will identify in most document types (office files, for example) if the source file has said added files to it. As a result, this can be called out to the reviewer in a read only field visible when the reviewer is looking at the document, and the reviewer can then inspect the extracted text to ensure that only the text appearing in the source document is being produced should that document be marked for production.
Third, most industry people I talk to see this as problematic from a review standpoint, not just because it adds documents for the reviewers to go through, but it also raises hosting and production costs on the back end.
4. EMBEDDED OBJECTS
Have you ever seen an email with one of those little image files down at the bottom associated with someone’s title block? That is the best example of an embedded object. Essentially an embedded object is usually some image that has been copied/pasted into a document. Most processing tools allow you to extract those embedded objects into their own files. When this option is selected, the tool will create a family relationship amongst the source file and the files that were extracted out of the source file using the same linking field identified in the Email Files section. Again, this is something the industry is moving away from for the same reasons outlined in the compound document section.
That covers the most common document family relationships. Next time, I will discuss how you should treat those various families and potential pitfalls and caveats in the various stages of the E-Discovery life cycle.