The Sedona Conference® Glossary, 3rd Edition, Copyright © 2010, Reprinted with permission.
Cache: A dedicated, high speed storage location that can be used for the storage of frequently used data. As data may be retrieved more quickly from cache than the original storage location, cache allows applications to run more quickly. Web site contents often reside in cached storage locations on a hard drive.
Caching: The storing of frequently-used data to speed access. See also Cache.
CAD (Computer Aided Design): The use of a wide range of computer-based tools that assist engineers, architects, and other design professionals in their design activities.
Case De-Duplication: Eliminates duplicates to retain only one copy of each document per case. For example, if an identical document resides with three custodians, only the first custodian’s copy will be saved. Also known as Cross Custodial De-Duplication, Global De-Duplication or Horizontal De-Duplication. See
Catalog: See Index.
CCD (Charge Coupled Device): A computer chip of which the output correlates with the light or color passed by it. Individual CCDs or arrays of these are used in scanners as a high-resolution, digital camera to read documents.
CCITT: Consultative Committee for International Telephone & Telegraphy. Sets standards for phones, faxes, modems, etc. The standard exists primarily for fax documents.
CCITT Group 4: A lossless compression technique/format that reduces the size of a file, generally about 5:1 over RLE and 40:1 over bitmap. CCITT Group 4 compression may only be used for bi-tonal images.
CDPD (Cellular Digital Packet Data): A data communication standard utilizing the unused capacity of cellular voice providers to transfer data.
CD-R, CD+R (Compact Disk Recordable): See Compact Disk.
CD-RW (Compact Disk Re-Writable): See Compact Disk.
CD-ROM (Compact Disk Read-Only Memory): See Compact Disk.
Certificate: An electronic affidavit vouching for the identity of the transmitter. See Digital Certificate, Digital
Signature, PKI Digital Signature.
CGA (Color Graphics Adapter): See Video Graphics Adapter (VGA).
Chaff/winnowing: Advanced encryption technique involving data dispersal and mixing.
Chain of Custody: Documentation and testimony regarding the possession, movement, handling, and location of evidence from the time it is obtained to the time it is presented in court or otherwise transferred or submitted; used to prove that evidence has not been altered or tampered with in any way; necessary both to assure admissibility and authenticity.
Character Treatment: The use of all caps or another standard format for treating letters in a coding project.
Checksum: A value used to ensure data is stored or transmitted without error. It is created by calculating the binary values in a block of data using some algorithm and storing the results with the data. When the data is retrieved from memory or received at the other end of a network, a new checksum is computed and matched against the existing checksum. A non-match indicates an error.
Child: See Document.
CIE (Commission International de l’Eclairage): The international commission on color matching and illumination systems.
CIFS (Common Internet File System): Used for client/server communication within Microsoft® operating systems. With CIFS, users with different platforms and computers can share files without having to install new software.
Cine-Mode: Data recorded on a film strip such that it can be read by a human when held vertically.
Cinepak: A compression algorithm; see MPEG.
CITIS (Contractor Integrated Technical Information Service): The Department of Defense now requires contractors to have an integrated electronic document image and management system.
Clawback Agreement: An agreement outlining procedures to be followed to protect against waiver of privilege or work product protection due to inadvertent production of documents or data.
Client: Any computer system that requests a service of another computer system. A workstation requesting the contents of a file from a file server is a client of the file server. See Thin Client. Also commonly used as synonymous with an email application, by reference to the Email Client.
Client Server: An architecture whereby a computer system consists of one or more server computers and numerous client computers (workstations). The system is functionally distributed across several nodes on a network and is typified by a high degree of parallel processing across distributed nodes. With client-server architecture, CPU-intensive processes (such as searching and indexing) are completed on the server, while image viewing and OCR occur on the client. This dramatically reduces network data traffic and insulates the database from workstation interruptions.
Clipboard: A holding area that temporarily stores information copied or cut from a document.
Cloud Computing: “[A] model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”
http: //csrc.nist.gov/groups/SNS/cloud-computing/ (last visited June 22, 2010). For further explanation see the
NIST Web site cited.
Cluster (File): The smallest unit of storage space that can be allocated to store a file on operating systems. Windows and DOS organize hard disks based on Clusters (also known as allocation units), which consist of one or more contiguous sectors. Disks using smaller cluster sizes waste less space and store information more efficiently.
Cluster (System): A collection of individual computers that appear as a single logical unit. Also referred to as matrix or grid systems.
Cluster bitmap: Used in NTFS (New Technology File System) to keep track of the status (free or used) of clusters on the hard drive. See NTFS.
Clustering: See Data Categorization.
CMYK: Cyan, Magenta, Yellow, and Black. A subtractive method used in four color printing and Desktop
Coding: Automated or human process by which documents are examined and evaluated using pre-determined codes, and the results recorded. Coding usually identifies names, dates, and relevant terms or phrases. Coding may be structured (limited to the selection of one of a finite number of choices), or unstructured (a narrative comment about a document). Coding may be objective, i.e., the name of the sender or the date, or subjective, i.e., evaluation as to the relevancy or probative value of documents. See Bibliographical/Objective Coding, Indexing, Level Coding, Subjective Coding, and Verbatim Coding.
COLD (Computer Output to Laser Disk): A computer programming process that outputs electronic records and printed reports to laser disk instead of a printer.
COM (Computer Output to Microfilm): A process that outputs electronic records and computer generated reports to microfilm.
Comb: A series of boxes with their top missing. Tick marks guide text entry and separate characters. Used in forms processing rather than boxes.
Comic Mode: Human-readable data, recorded on a strip of film that can be read when the film is moved horizontally to the reader.
Comma Separated Value (CSV): A record layout that separates data fields/values with a comma and typically encloses data in quotation marks.
Compact Disk (CD): A type of optical disk storage media, compact disks come in a variety of formats. These formats include CD-ROM (“CD Read-Only Memory”) – read-only; CD-R or CD+R (“CD Recordable”) – can be written to once and are then read-only; and CD-RW (“CD Re-Writable”) – can be written to multiple times.
Compliance Search: The identification of and search for relevant terms and/or parties in response to a discovery request.
Component Video: Separates video into luminosity and color signals that provide the highest possible signal quality.
Composite Video: Combines red, green, blue and synchronization signals into one video signal so that only one connector is required; used by most TVs and VCRs.
Compound Document: A file that collects or combines more than one document into one, often from different applications, by embedding objects or linked data; multiple elements may be included, such as images, text, animation, or hypertext. See also OLE.
Compression: Compression algorithms such as Zip and RLE reduce the size of files saving both storage space and reducing bandwidth required for access and transmission. Data compression is widely used in backup utilities, spreadsheet applications, and database management systems. Compression generally eliminates redundant information and/or predicts where changes will occur. “Lossless” compression techniques such as Zip and RLE preserve the integrity of the input. Coding standards such as JPEG and MPEG employ “lossy” methods that do not preserve all of the original information, and are most commonly used for photographs, audio, and video. See Container File, Decompression, Lossless Compression, and Lossy Compression.
Compression Ratio: The ratio of the size of an uncompressed file to a compressed file, e.g., with a 10:1 compression ratio. Example: a 10 KB file can be compressed to 1 KB.
Computer Forensics: Computer Forensics is the use of specialized techniques for recovery, authentication, and analysis of electronic data when an investigation or litigation involves issues relating to reconstruction of computer usage, examination of residual data, authentication of data by technical analysis, or explanation of technical features of data and computer usage. Computer forensics requires specialized expertise that goes beyond normal data collection and preservation techniques available to end-users or system support personnel, and generally requires strict adherence to chain-of-custody protocols. See also Forensics and Forensic Copy.
Computer: Includes but is not limited to network servers, desktops, laptops, notebook computers, mainframes, and PDAs (personal digital assistants).
Concatenate: Generally, to add by linking or joining to form a chain or series; two or more databases of similar structure can be concatenated to enable the user to reference them as one.
Concept search: The use of word meanings to identify documents relevant to a specific query. Word meanings can be derived from any of a number of sources, including dictionaries, thesauri, taxonomies, ontologies, or computed mathematically from the context in which the words occur. Concept searching typically improves
the relevance ranking of the search results and can identify additional documents that are meaningfully related to the query even if they do not have the specific query term in them.
Conceptual Analytics: Using one or more of a number of mathematical algorithms or linguistic methodologies to group documents by their common themes or ideas..
Container File: A single file containing multiple documents and/or files, e.g., .pst, .nsf and .zip files. The file must be ripped or decompressed to determine volume, size, record count, etc., and to be processed for litigation review and production. See Decompression and Rip.
Content Comparison: A method of de-duplication that compares file content or output (to image or paper)
and ignores metadata. See also De-Duplication.
Contextual Search: Using one of a number of mathematical algorithms or linguistic methodologies to enlarge search results to include not only exact term matches but also matches where terms are considered in context of how and where they frequently occur in a specific document collection or more general taxonomy. For example, a search for the term “diamond” may bring back documents related to baseball but with no reference to the word diamond because they frequently occur within the same documents and therefore have a logical association.
Continuous Tone: An image (e.g., a photograph) that has all the values of gray from white to black.
Convergence: Integration of computing, communications, and broadcasting systems.
Cookie: A text file containing tracking information such as dates and times of Web site visits, deposited by a Web site onto a user’s computer. The text file is accessed each time the Web site is visited by a specific user and updated with browsing and other information. The main purpose of cookies is to identify users and possibly prepare customized Web site for them, including the personalization of advertising appearing on the Web sites.
Coordinated Universal Time (UTC): a high precision atomic time standard with uniform seconds defined by International Time and leap seconds announced at regular intervals to compensate for the Earth’s slowing rotation and other discrepancies. Leap seconds allow UTC to closely track Universal Time, a time standard based not on the uniform passage of seconds, but on the Earth’s angular rotation. Time zones around the world are expressed as positive or negative offsets from UTC. Local time is UTC plus the time zone offset for that location, plus an offset (typically +1) for daylight savings, if in effect.
For example, 3: 00 a.m. Mountain Standard Time = 10:00 UTC – 7. As the zero point reference, UTC is also referred to as Zulu time (Z). See also Normalization.
Corrupted File: A file damaged in some way, such as by a virus, by software or hardware failure, or degradation with the passage of time, so that it is partially or completely unreadable by a computer.
COTS (Commercial Off-the-Shelf ): Hardware or software products that are commercially manufactured, ready-made, and available for use by the general public without the need for customization.
CPI: Characters Per Inch.
CPU (Central Processing Unit): The primary silicon chip that runs a computer’s operating system and application software. It performs a computer’s essential mathematical functions and controls essential operations. Also known as Microprocessor.
CRC (Cyclical Redundancy Checking): Used in data communications to create a checksum character at the end of a data block to ensure integrity of data transmission and receipt. See Checksum.
CRM (Customer Relationship Management) Application: Applications that help manage clients and contacts. Used in larger companies. Often a significant repository of sales, customer, and sometimes marketing data.
Cross-Custodian De-Duplication: Culls a document to the extent multiple copies of that document reside within different custodians’ data sets. See Case De-Duplication and De-Duplication.
CRT (Cathode Ray Tube): The picture tube of older computer monitors or televisions, to be distinguished from newer “flat” LCD or plasma screens.
Cryptography: Technique to scramble data to preserve confidentiality or authenticity.
Cull (verb): To remove a document from the collection to be produced or reviewed. See Data Filtering, Harvesting.
Custodian: See Record Custodian and Record Owner.
Custodian De-Duplication: Culls a document to the extent multiple copies of that document reside within the same custodian’s data set. Also known as Vertical De-duplication. See De-Duplication.
Customer-Added Metadata: See User-Added Metadata.
Cyan: Cyan-colored ink reflects blue and green and absorbs red.
Cylinder: The set of tracks on both sides of each platter in a hard drive that is located at the same head position. See Platter.