There was Volume, there was Velocity and of course, there was Variety.
The three Vs.
Story has it that when the technology world realised that the insatiable thirst for information meant that capacity of data was reaching a level that surpassed mans understanding of storage, the industry came up with the 3Vs – Volume, Velocity and Variety. To most of us these three words relate directly to the Big Data explosion and can thank the Meta Group ( now Gartner ) for coining this new terminology.
So the three Vs in the context of big data goes something like this.
The size of available data has been growing at an increasing rate. This applies to companies and to individuals. A text file is a few kilo bytes, a sound file is a few mega bytes while a full length movie is a few giga bytes. More sources of data are added on continuous basis. The world’s technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s; apparently by 2012 every day 2.5 Quintillion (2.5×1018) bytes of data were created. OK this is big big data. And getting bigger. Never smaller. Inevitable.
The importance of data’s velocity — the increasing rate at which data flows into an organization — has accelerated with the Internet and mobile era as we deliver and consume products and services using a plethora of personal productivity devices and MachineToMachine (M2M ) devices , generating a data flow back to the provider.
Rarely does data present itself in a form perfectly ordered and ready for processing. It could be text from social networks, image data, a raw feed directly from a sensor source like a M2M network of cameras, digital capture sensors and self service units. Variety also exists with the devices, browsers, and interactive connectivity devices.
Big data is data that exceeds the processing capacity of conventional database systems. Who would have thought it. To leading organisations, like Wal-mart, Amazon and Google, this power has been in reach for some time, but at fantastic cost. Commodity hardware, cloud architectures and open source software bring big data processing into the reach of the less well-resourced. Big data calls for scalable storage, and a distributed approach to querying. Many companies already have large amounts of archived data, perhaps in the form of logs, but not the capacity to process it.
The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to “spot business trends, determine quality of research, prevent diseases, combat crime, and determine real-time roadway traffic conditions. Take the Large Hadron Collider . You know the thing that is buried beneath the Franco-Swiss border near Geneva and run by CERN. The data flow from all four LHC experiments represents 25 petabytes annual rate before replication This becomes nearly 200 petabytes after replication.
So the three Vs puts data management on a new level because as we are all learning big data is difficult to work with using most relational database management systems and desktop statistics and visualization packages, requiring instead “massively parallel software running on tens, hundreds, or even thousands of servers. Economical not viable. So unless you have deep pockets to buy big data fabric infrastructures to support Apache Hadoop and other big data platforms like MongoDB, Nutch and Pentaho you are looking to partners to provide such services. Cloud of course. Oh BTW – what great names for such platforms and tools. Its like all common sense and naming convention has gone out of the window and why not I say 🙂
OK Brummie so thanks for the lesson on Big Data but whats the 3Vs in your world?
Well I see a different set of three Vs that are much closer to home for the typical IT organisation. Perhaps not so sexy like Large Hadron Colliders, NASA systems and Google search servers but closer to home and very pertinent I believe to the IT organization.
Here they are.
The volume of ;’events’ in a modern IT organization has grown. How much? No one can probably quantify to the depth of the big data chasers but I maintain that because of the traditional 3Vs the impact on the IT organization has seen exponential change in the volume of events that need handling. Perhaps I am talking about the classic ITSM – incident, problem, resolution, helpdesk trend reports etc – but perhaps I am also talking about all of this plus event logs being written, reports being run, printing, self-service events, scanning, data transfers, emails sents, conversations, access requests and so on. How many of these ‘events’ therefore happen in a given day. In a week. Last year. Who knows but IT has to handle them either manually, semi-automated or via self -service tools and processes. I wonder how many an average user generates in a day. A month. A year.
The pace of change to applications – Systems of Record, Systems of Innovation and Systems of Differentiation ( see my post about this http://wp.me/p15xAC-xX – is driving such velocity of light weight ‘front office’ apps that originate from public cloud ‘stores’, allowing an empowered mobile user to create ‘IP’ for themselves that allows them to communicate and be smarter. And totally outside the ‘eyes’ of the IT organization. Such velocity whilst intuitive for the user creates a new dynamic for the IT organization as they look for tools and controls to manage this velocity. Are they there yet? No of course because for many this velocity is not measured – or even seen by IT. Its hidden. Or in the shadows.
The devices explosion has introduced so much variety of operating systems, hardware types and application choices that the IT organization of the future has to build a command and control infrastructure that will manage devices that ‘they have no knowledge of nor where it is located’. The idea that a user will create content that the IT department will never see, backup and audit is often a scary ( but inevitable ) thought. How much variety does an IT organization face each day. Compared to last week. To last year. What about next week. Next year.
Of course my 3Vs is really a subset of the real 3Vs, but I often wonder if an IT organization has ever ‘captured’ the growth of these 3Vs rather than get fixated by the perennial debate around ‘ do we need big data’, ‘do we have big data’ ,’if we don’t have big data lets get some’ and ‘now we have it what do we do with it.’
Now that would be an interesting infographic to keep an eye on!
PS. I am going to post quite soon on another dynamic I now see around the competing forces as traditional IT gets squeezed by more dynamic forces driving change from outside the IT organization.