A relentless explosion of Big Data continues to ignite pervasive and persistent problems as organisations grapple with how best to retain, access, discover and ultimately delete content in compliance with evolving regulations. Growth is being impacted by evolving data retention requirements, and industry regulations, which necessitate that some types of data be kept for anywhere from a few years to indefinitely.
Big Data plagues many stakeholders, from IT to Legal. While IT departments grapple with how to support complex Big Data environments, legal teams are tasked with making accommodations for Big Data in the already expensive eDiscovery process.
The world generated more than one zettabyte (ZB), or one million petabytes (PBs), of data in 2010. This year, the growth is predicted to reach 72 ZBs a year, fuelled in part by the rapid rise of machine-generated data. Structured data (e.g., data records from programmed trading and financial transaction systems, intelligent meters, call-detail records for smartphones and tablets, etc.), unstructured data (e.g., images, audio or video files) as well as semi-structured data (e.g., emails, logs, etc.) add yet another layer of management complexity, especially when determining the most efficient and reliable way to ingest, protect, organise, access, preserve and defensibly delete all this vital information.
In sifting through voluminous Big Data to find responsive information, organisations can spend millions of dollars to isolate relevant Electronically Stored Information (ESI) and even more to review it. Simply put, the Big Data problem brings new meaning to the phrase, “looking for a needle in a haystack.”
Companies can begin to view data backups and archives more strategically while leveraging integrated solutions for lowering storage costs and compliance risks. Technology solutions need to meet the demands of the business with a flexible and adaptable strategy that best reflects the needs of the business as it evolves.
Crossing Big Data’s Backup and Archive Chasm
For many organisations, backup and archive functions are deployed and maintained as separate “silos” within an overall information management strategy. Multiple, disparate hardware and software products typically manage these data silos, which leads to duplicate copies of information that must be protected and preserved. Additionally, legal pressure to find and preserve data typically causes yet more silos or a worst-case scenario – indefinitely extended retention of information assets because of inadequate visibility into what an organisation is keeping.
Storage and backup administrators oversee data protection and are heavily focused on the impact Big Data has on backup windows, recovery SLAs and infrastructure costs. While information management buyers are fixated on how Big Data affects data retention, discovery and information governance policies, and often operate without regard to the operational impact of these policies.
As a result, a chasm exists between these two critical constituents in ongoing Big Data conversations. According to Gartner, backup complements archive and vice versa – yet most tools and technologies address either one or the other of these disciplines. Gartner, among others, predicts that being able to look at backup and archive holistically promises significant cost reduction and risk management benefits. The convergence of backup and archive is an emerging concept that’s gaining traction as organisations seek solutions to reduce the number of copies created for backup and archiving while more closely aligning data access policies for both.
Taking a United Front on Data Convergence
One way to accomplish this is the unification of backup and archive, but it requires cross functional teaming, ensuring that the needs of the business are met for every stakeholder. This starts with developing a better understanding of how applications, users and critical business processes need to access data throughout its lifecycle. As part of this process, many of the hurdles thwarting streamlined access to individual and corporate data across the enterprise will be uncovered while at the same time, areas will be identified where limited visibility into vital information assets has created undue exposure to compliance and information governance risks.
The notion of a single data repository that eliminates redundancies and separate silos is compelling on many levels. A holistic approach that captures data once and then repurposes it for data protection and preservation is key to getting the right data into the hands of the right people so they can turn it into something more meaningful and actionable for the business.
Moreover, the ability to leverage a single-query data repository enables legal teams to obtain the most comprehensive results to an eDiscovery request in the least amount of time. Having a single collection ensures that all data sources are accounted for in a discovery effort, ensuring all case critical data has been collected, preserved and is ready for review. Also, a central place to delete data also reduces both the cost and risk of inadvertently storing multiple copies. Understanding large data pools well enough to extract and collect relevant subsets for both reactive and proactive eDiscovery can prove to be a huge cost and risk reduction exercise.
Converged data protection and retention strategy allows for centralized reporting that enables business and IT leaders to make more informed decisions with their data while bolstering analytical skills. Organisations can extend their view into the business with embedded intelligence and analytical tools that provide granular insights into the ever-evolving role data can, and should play, in driving business direction.
Most important, companies can maintain a balance between capturing too much data or not enough as both scenarios pose potentially serious business risks. Armed with appropriate insight and tools, it’s possible to verify whether all data sources have been collected across the enterprise. With robust reporting and predictive tools, it’s much easier to forecast, analyze and budget properly for the ongoing onslaught of Big Data. Reporting can be used as a tool in the eDiscovery process to effectively defend methodologies of a data collection and preservation effort of an organisation responding to litigation, regulatory request or an internal investigation.
Forward-thinking companies, which embrace a unified approach for managing both backups and archives, will be able to take full advantage of a future-proof solution that elevates overall information management while providing appropriate access to business-critical information as it ages.