The majority of data that plagues organizations today is unstructured and it adds up to 70 to 80 percent, according to some studies.
Unstructured data is typically controlled and “managed” by individual employees. Much of this unstructured data is considered “dark data” because it is unclassified and not easily accessible by the organization.
Instead of being stored in managed ECM systems, this unstructured data is most often stored on employee workstations, removable media, enterprise file shares, or even outside the organization’s control in personal cloud storage accounts.
- Dark data poses a growing cost, both from a storage and liability perspective to the organization because it’s still considered a company asset and within its scope of responsibility.
- Unstructured data is considered “dark data” because it is unclassified and not easily accessible by an organization. To gain control of your dark data and move from you company’s Dark Age into the Digital Transformation Age, identify the records and information that either has discrete value or poses the greatest risk and act on it before your dark data becomes a black hole.
As reported by Osterman, the Compliance, Governance and Oversight Counsel (CGOC) conducted a survey in 2012 that revealed that on average, 1 percent of organizational data is subject to legal hold, 5 percent is subject to governmental regulatory retention requirements, and 25 percent has some business value.
Trending Now: 9 Questions to Uncover Your Unstructured Data
This basic breakdown in data value concluded that approximately 69 percent of any organization’s retained data had no obvious business value and could be disposed of without legal, regulatory or business consequences. The probabilities of overall data reuse drop off rather quickly, approaching 1 percent after just 15 days. Osterman Research found, in the survey they conducted, that only 46 percent of organizations have a defensible disposition program in place.
The limited potential of re-use coupled with the statistics showing that only 31% of this dark data is required for legal, business or regulatory reasons demonstrates a compelling opportunity to focus on people, process and tools that can identify, classify and separate this portion of the data to shed some light on it.
Conversely there is an equal opportunity to leverage these same tools and processes to identify and possibly destroy the remainder of this dark data so long as the organization has defined a defensible disposition strategy.
What are the top 4 defensible disposition actions that file analysis and disposition enable?
If redundant, outdated and trivial (ROT) makes up 70 percent of this dark data we’ve defined, then organizations must find a way to either identify what has value or they must find a way to determine what does NOT have value. Either strategy requires a repeatable methodology that when documented and used consistently, facilitates defensible destruction of dark data, regardless of whether it is stored in a file share, cloud service provider, email system or some other system.
As records are successfully identified as having value, whether legal, business or regulatory, they can be classified or tagged appropriately and that metadata that is defined, may be used to support the migration of the information to active repositories and systems where the data can provide back that value that otherwise would have been lost in the black hole. Many organizations are using these File Analysis and Remediation Strategies to rein in the dark data and bring it to light by moving it into collaborative and knowledge management systems like SharePoint.
Read Also: What Is File Share Remediation?
As records are identified that have long term retention requirements but are not accessed frequently, they are generally moved to an organizations enterprise archive. Archives are not back ups or simply electronic content management repositories; rather, they are services made of people, process and technology that assess records, define retention requirements, transform records to preservation-ready formats, assure integrity and as a result enable substantial retirement and decommissioning cost savings to the enterprise.
Leave in Place
Identify and classify the information and then proactively continue to manage the information where it resides, typically after the organization destroys the ROT.
Digital Transformation Age
In this digital transformation age, an organization may need to start it's journey toward dark data defeat and effective file analysis practices with a business use case. Some of the most compelling use cases include:
- Regulatory Compliance – These use cases actually encompass a variety of use cases including Pharmaceutical/GxP (Good Laboratory, Clinical, or Manufacturing Practices) or Financial (FINRA) where specific data types must be managed according to a particular regulation.
- Legal & eDiscovery – eDiscovery and Legal Search tools are often very similar in function and in may cases are the same technologies used in different ways. When there is a specific hold or when companies are doing early case assessments, all data, dark and light, structured and unstructured, need to be identified and tagged and eventually if on hold, held, e.g. not altered nor deleted.
- Security & Privacy – Sometimes overlapping with regulatory compliance, there are requirements that must be met to protect personal data and information, whether employee, patient, intellectual property or some other type of information. Companies must be vigilant about protecting sensitive data and put controls in place to protect and then audit that protection of data. If data is dark, it is only a matter of time before it becomes a financial black hole and a PR black eye.
- IT Infrastructure – Establishing Service, Application and Hardware Retirement and Decommissioning programs are a fourth strong use case that may be used to start the journey. The cost savings are easy to demonstrate and the sky is the limit in terms of what your organization can achieve once the data is transformed through file analysis and remediation.
There are many naysayers in the records management community that have tried auto-classification engines in the past and will argue that they could not get the job done; they could not reach an acceptable level of precision nor accuracy for the company’s legal department.
The reality is, companies are using these tools successfully today.
The keys to success are understanding the limits of the tools that are available and planning the service that encompasses and supports the toolsets that are employed. It is also critical to have business engagement to help define business rules for classification and to help identify other tools available to support the process, tools like ontologies, thesauri, taxonomies and the like.
Additionally, it is equally important to remember that it is not an IT problem – it is a business-wide organizational challenge.
Even if an organization tried all of these and failed several years ago, there is reason to at least take another look. There are new tools available that don’t just focus on text and language, they now reach multiple dimensions and consider document structure, context, language and text and the results are compelling.