Posted on: November 30, 2017in Blog
Help Your Employees Find the Information They Need with Machine Learning
Using machine learning, organizations can help knowledge workers quickly and efficiently find the information they need. Organizations can use machine learning to manage unused operational data (dark data) and redundant, outdated, or trivial data (ROT) to help reduce the amount of information knowledge workers have to sift through. Machine learning can help improve the search and discovery of information by enabling multi-concept search that produces contextually relevant results. Organizations can also use machine learning to automate information gathering tasks such as taxonomy building, ontology building, tagging, and document classification freeing up time for knowledge workers to focus on core tasks.
Machine Learning Can Help Reduce Dark Data and ROT
Organizations are not only grappling with the speed in which unstructured data is now generated, but also the rapidly growing amount of dark data and ROT. According to the March 2016 Veritas Technologies Global Databerg report, approximately 85% of stored data is either dark data or ROT leaving only 15% of stored data to be classified by IT leaders as business critical information. The report also states that if dark data and ROT are not dealt with, by the year 2020, it will unnecessarily cost organizations worldwide a cumulative $3.3 trillion to manage.
When it comes to enterprise information ROT, sheer duplication is one of the biggest problems. In searching masses of information, knowledge workers often retrieve the same or similar information again and again. While exact duplicates can be easily detected by hash codes, machine learning methods go beyond that to bring together and prioritize versions, near-duplicates, and conceptually related material. Machine learning applied to search can also shed light on dark data, by expanding user searches with conceptually related material, and linking actively used documents with related material that has been gathered, but not yet been exploited.
“Brainspace’s core technologies in conceptual search, unsupervised learning, and natural language processing allow cutting through masses of unstructured data to find its core value,” says information retrieval pioneer Dave Lewis, who recently joined Brainspace as Chief Data Scientist. “The combination of these technologies with powerful visualizations means that no upfront cleanup, indexing, or coding is necessary before enterprise data can be analyzed and unsuspected connections uncovered.”
Machine Learning Allows Information Gathering Tasks to be Automated
Organizations that have yet to leverage machine learning often have teams of knowledge workers spending much of their time performing information gathering tasks such as taxonomy building, ontology building, tagging, and document classification. Many of these organizations are finding that the amount unstructured data generated from within the business is growing at a rate that is simply too much for knowledge workers to manually tag, classify, and maintain.
Machine learning allows information gathering tasks to be automated so that knowledge workers can focus on core tasks. Machine learning is also capable of performing these tasks at very high rates of speed, far beyond human capabilities.
It should be noted that a machine learning platform with strong document classification capabilities can actually reduce the need for traditional taxonomies/ontologies altogether.
Machine Learning Can Help Improve Search and Discovery of Information
Every second massive streams of unstructured data are generated from emails, social media sites, smartphones, sensors, wearable electronics, and many other data sources. The magnitude, variety, and velocity of unstructured data is staggering; for knowledge workers (and anyone looking for specific information), trying to find the information they need is like looking for a needle in an infinite number of haystacks.
Search engine companies like Google and Microsoft Bing are using machine learning and artificial intelligence to help millions upon millions of people search the web to find information about pretty much everything. According to the Internet Live Stats website, there are 55,218 Google searches every second at the time of this writing.
Knowledge workers need to find and use information from multiple sources
While Google and Microsoft Bing are using machine learning and artificial intelligence to improve the search and discovery of information on the web, knowledge workers often need to search for information that can only be found within organizations. Knowledge workers are often looking for information that may be found within document management systems, intranets, extranets, email archives, and portals. In order to successfully do their jobs, knowledge workers must quickly find and use information from many different sources including the web.
Many organizations have neither an effective data management system in place, nor do they provide effective search tools for knowledge workers. Organizations need to ensure that knowledge workers can quickly and effectively search for and find relevant information regardless of whether they’re looking for that information from sources within the enterprise or the web.
Knowledge workers need more than traditional keyword search
Traditional search engines work best if the user knows exactly what they’re looking for and can provide relevant keywords for their search. Knowledge workers often search for information without knowing exactly what they’re looking for and sometimes without knowing exactly what to ask for. If a knowledge worker doesn’t know the exact keywords to enter, a traditional search engine may not return results that are relevant to the query. If a traditional search engine does return results that are relevant to the query, and the user would like to see similar results, the search engine may not be able to provide results that are conceptually similar. In addition, the search engine may return thousands of results that the worker does not have time to sift through and read.
Machine learning allows for truly semantic, multi-concept search
Traditional keyword search is one dimensional; it provides results that contain the specific keywords, but often fails to return results that are conceptually related to the original search query. Knowledge workers need search tools that allow them to search for information using concepts instead of keywords. Using machine learning, organizations can make it possible for knowledge workers to search for information using multi-concept search that produces contextually relevant results.
Machine learning allows for truly semantic, multi-concept search. While many semantic search engine platforms use machine learning, they are not all created equal. Dave Copps, Brainspace founder and CEO, says that “semantic search engines are able to go beyond keyword matching and match on concepts. In fact, the more powerful semantic search engines will sometimes produce relevant search results that do not contain any of the original query words bringing an element of serendipity to search.”
Using machine learning and artificial intelligence, Brainspace has built a truly semantic search platform that is capable of automatically and continually learning, scaling intelligence, and understanding the intent and context of the user in order to return the most relevant results. “Typically semantic technologies look to average all concepts in a corpus—find a semantic center. This is a significant deficiency in other semantic search technologies,” says Copps. “Multi-concept enables our machine learning processes to achieve a more human-like learning from any given corpus because of its recognition of multiple themes from within a single section of text.”
One of the biggest challenges for knowledge workers is finding relevant information from the many vast sources of data available; 44% of the time, knowledge workers cannot find the information they need. According to an IDC report, an enterprise with 1,000 knowledge workers loses $5.7 million on average every year because of lost productivity caused by workers searching for, but not finding, relevant information.
There is simply too much unstructured data generated every second from within organizations and external sources; there are too many vast streams of data moving at lightning speed. The amount of unstructured data generated from within organizations alone has grown far beyond the human capacity to search through, organize, and manage.
Organizations can no longer expect knowledge workers to manually tag, classify, and maintain the massive amounts of unstructured data generated from within the business each and every day. Organizations can no longer expect knowledge workers to find the information they need using only traditional keyword search. Organizations can expect modern search tools powered by machine learning, artificial intelligence and other advanced technologies to help knowledge workers quickly and efficiently find the information they need.
Artificial intelligence and machine learning-powered semantic search platforms like Brainspace can help organizations help knowledge workers search for and find the information they need to successfully do their jobs. The Brainspace platform is highly scalable and can analyze the world’s largest unstructured datasets identifying the relationships between words, phrases, and categories without the need for human intervention. Brainspace can provide conceptually and contextually relevant search results to users all while actively learning and dynamically adapting to new content.
- Brainspace Announces Major New Release of Flagship Investigative Analytics Solution
- How to Use Managed and Prioritized Workflows to Reduce the Cost of Review [Webinar]
- CAL, TAR and a dash of ESI: Predictive Analytics Acronyms You Need to Know
- Innovative Analytics Workflow Found the Needle in a Pile of Needles
D4 Weekly eDiscovery Outlook
Power your eDiscovery intellect with our weekly newsletter.
Posted June 15, 2018
What is ESI? Terms and Concepts Every Attorney Should Know
Posted June 08, 2018
Computer Forensics Investigations Unlock Key Digital Evidence
Posted June 01, 2018
Global Legal & IP ConfEx | New York, NY | London, UK
Posted June 01, 2018
7 Factors to Consider Before Creating an Email Retention Policy
Posted May 25, 2018
Record Retention Best Practices: 5 Things to Consider for Paper Documents
Posted May 18, 2018
Burdensome eDiscovery Requests: How to Manage the Cost of Review
Posted May 11, 2018
Contract Dispute Case Study: Beyond TAR 1.0
Posted May 03, 2018
7 Tips to Preserve and Review Mobile Device Data
Posted April 26, 2018
GDPR and Blockchain: My Rights vs. My Immutable Records
Posted April 25, 2018
Relfest London 2018 | Your GDPR Last-Minute Preparation Checklist