Over the past half-decade, the Big Data flame has spread like wildfire throughout the enterprise, and the IT department has not been immune. The promise of data-driven initiatives capable of transforming IT from a support function to a profit centre has sparked enormous interest.
While the Four Vs of Big Data – volume, velocity, variety, and veracity – are intended to serve as pillars upon which to construct Big Data efforts, there’s a fifth V that needs to be included, and that’s value. Every Big Data initiative should begin with the question “What value do I want to derive from this effort?” How a group or organisation answers that question should deeply inform the means by which that end is achieved. To date, however, value has very much been the ‘silent v’.
The term “data gravity” was coined by Dave McCrory, the CTO of Basho Technologies, and refers to the pull that data exerts on related services and applications. According to McCrory, data exerts this gravitational pull in two key ways. First, without data, applications and services are virtually useless. For this reason, application and service providers naturally gravitate toward data, and the bigger the data set, the more applications and services it will attract.
The bigger the data set, the harder it is to move. Generally it’s more efficient and cost-effective to perform processing near where the data resides. We’ve seen large companies use cloud-based services for IT operations data. If the data itself originates in the same cloud, this approach is fine. Even data generated on-premise can be stored and analysed in the cloud if it’s small enough. For large amounts of data generated outside the cloud, however, problems arise.
An example of this problem is threat detection systems, which have been in the news in association with high-profile data breaches. The low signal-to-noise ratio of these systems means that alerts are often ignored altogether, and actual threats are missed amidst the chaos. Finding the signal in all that noise can be hard, and when time is of the essence, cutting through the noise can become mission-critical. If you’re sifting through garbage, the chances of finding what you need in time drop dramatically.
Consider the motion of your data. Is the data you’re trying to analyse at rest or in flight? The answer to this question has a huge impact on how you process, view, and analyse the data, as well as the value you can derive from it.
Most Big Data is at rest and analysed post hoc in batch processes that rely on indexing and parallel processing using techniques based on sharing or MapReduce. At its core, this approach is all about volume and variety, and enterprises are leveraging multiple frameworks and data stores – such as Hadoop, MongoDB and Cassandra – for a variety of structured and unstructured data. While multiple data sources provide context and insight, this approach is always going to be retrospective.
Recently, greater attention is being paid to data-in-flight as the need for greater agility and adaptability drives demand for higher velocity analysis. High velocity data-in-flight is of paramount importance. It gives IT the ability to see how systems are behaving in the moment, compare that behaviour to established baselines, and drill down to find the root cause of a problem.
While data-in-flight can provide incredible value, analysis of this data requires a fundamentally different approach based on stream processing and summary metrics. In many cases, the data volume is such that it must be processed in-flight. In other cases, real-time information is more valuable, while old data is less valuable.
It’s important to remember that no single dataset or analytics framework can be all things to all people; most tools that offer a single pane of glass wind up serving nothing more than a single glass of pain. By leveraging multiple datasets as well as analytics and visualisation products optimised for particular data types and goals, IT teams can achieve a complete, correlated, cross-tier view of the environment, enabling them to eliminate waste, create greater efficiency, and maximise scarce resources. This approach spells not only value for IT, but also value for the business as a whole.