Over the last several years, the desire to understand the surging rivers of digital data forming all around us has led inexorably towards something more meaningful than simple analysis. The rise of consumer analytics, and by that I mean analytics tools that literally anybody could and would use, could be arguably said to have begun with the introduction of Web analytics. After all, practically everyone — and every business, large or small — now has a Web presence. The push to understand the traffic and interaction of this ever-more-important touchpoint with the world has grown steady over the last two decades. The old business intelligence solutions of yore tended towards the sensibility of serious-minded scientific and professional tools, with the complexity and learning curve to match. In stark contrast, the current crop of populist analytics tools (examples: Google Analytics and more recently mobile-friendly services such as Clicky or analytics aggregators like Trakkboard) make it drop-dead easy to see the underlying numbers.
But in the end, numbers just haven’t been enough. Especially when it comes to unstructured data like social media, where the most important information isn’t going to be neatly bucketable into categories like clicks or bounce rates. The vagaries of human conversation are messy, hard-to-process, and seemingly worst of all, unpredictable. Just as challenging, the flood of data from these sources is vast. However, though sensors (such as those in our smart devices such as phones or tablets, or in our Internet-connected home/office) create a formidable and a rapidly growing surface area for big data, this data generally falls into the large, yet well-understood category. In contrast, the most difficult to operationalize listening and analytics processes are ones that 1) which the information is uniquely formatted on a normal basis and 2) requires excessive human intervention to process or respond to effectively. To succeed, both of these must be ameliorated, and if possible, dramatically improved by understanding the implications of the unanticipated and then correctly formulating a useful business response. Anything short of this makes the scale problem untenable for businesses: Mountains of data requiring mountains of people in order to service defeats the very purpose of using technology in the first place.
I used the word “operationalize” in the previous paragraph, because that’s what many companies set out to do in 2011: Begin building business processes that were supported by new types of listening and analytics tools in an effort to make sense of their enterprise data, social data, and other data sources. In the process they gave rise to the term ‘big data’, a whole host of existing and new technologies and processes integrated into a “stack” that can start cracking both the data volume problem as well as the messy data problem (and opportunity) inherent in data pools such as social media. One of the root causes: Traditional analytics and business intelligence largely fell behind the needs of businesses to start seeing all of the data relevant to them, whatever the form, and being able to understand what it really meant quickly and easily enough to actually do something about it.
The now-famous McKinsey report on big data last year had much to say on the topic and it’s required reading these days, including some extremely compelling case studies and examples. But while the domains of healthcare, government, retail, manufacturing, and telecommunications have much of the industry-specific focus on big data, the one general domain where most businesses of any kind will be impacted is in social media. This is where tools, platforms, vendors, techniques, skills, training, and much more will have to be developed and brought to bear. To be sure, a good amount of what has already been developed for the fields of analytics, data warehousing, business intelligence, databases, visualization, natural language processing, and more will be usable. But much of it won’t or has become obsolete because of the vast pools of deep data accumulating around our organizations and its unique structure.
Therefore, a new discipline is being synthesized out of the existing pieces that are providing important foundational elements for big data. In many cases entirely new elements will have to be introduced as well. These include: 1) Radical solutions to address scale and performance issues, 2) edge-of-the-envelope machine learning capabilities ala IBM’s Watson that can remove human decision making when it’s unnecessary, 3) unsupervised identification of strategic business concerns and opportunities, and 4) an operational construct that directly affects the course of the business (as opposed to generating automated reports that sit unread inside an e-mail attachment.) Big data has the promise to deliver on all of these and in ways that will drive better innovation, competition, and bottom-line results. It will take time, but hopefully not hard work, except for those that produce the tools, although that may yet be too much to ask.
For now, 2012 will be an experimental year, a year spent preparing the fundamentals and figuring out the ways in which the pieces of the many disciplines that big data draws from will fit together. Many organizations will be trying out new ways to integrate big data ideas into their business while others will be putting the tools through their paces. But the smart organizations will be doing both, since big data is as much about process as it is about technology. All will be learning what works and what doesn’t (for them), and seeing where the holes in their organization are for enabling the outcomes. There’s little question that big data will be one of the biggest IT and business stories of the year, but exactly how and why is unwritten so far.
10 Big Data Predictions for 2012
Here is what I think are likely the most significant big data happenings this year:
- Data scientists will be in short supply, while data warehouse and BI folks will try to migrate over. Yet lack of experienced big data architects will represent the real hold-up for now. As Tom Groenfeldt of Forbes says, it’s really a matter of degrees when it comes to labeling someone a big data scientist. It’s also clear that practitioners of precursor fields of big data will be lining up to get involved, yet often lack the new thinking required to master the field. But the biggest shortage in my opinion will be in the enterprise-scale strategists capable of crafting and realizing a big data vision, one step-at-a-time.
- Analytics vendors (social and otherwise) will start down the big data path. Many won’t get far, but a few will make the transition. When venerable old-guard analytics companies like SAS start releasing big data reports, you know it’s the buzz word du jour. Yet this is inevitable with any important new technology trend. What’s more significant is that very few vendors will have a comprehensive blueprint or framework for big data, most will be providing point solutions, and that’s fine. In this early wave, big data suites as such really don’t exist and it’s up to companies to curate a set of capabilities.
- Everyone will label everything big data in 2012, making it hard to see what makes the approach or technology stand apart and provide a unique solution. The issue of signal-to-noise with big data marketing and hype will threaten to obscure its real meaning for some, yet others will take what big data represents — a set of innovative new approaches to solving new and long-standing business problems in a much more agile, integral, and high impact manner — as a call to significant action. The Register recently published an effective cross-check of what big data really means to those on the ground. Regardless of the term itself, from their surveys it’s clear there’s a broad perception that big data will let organizations tackle problems that were previously ‘too hard or too expensive’.
- Companies looking for instant nirvana with one-click setup and zero-configuration of big data solutions will increasingly have their needs met, but slowly at first. The problem with effective big data is that it’s not just about predetermined buckets or templates for business intelligence; it’s about meaningful analysis and processing of information in a way that’s highly relevant to the business. .
- Codifying the domain of a business in order to ‘teach’ an organization’s big data platforms will turn out to be one of the outstanding challenges, but some initial solutions will emerge. Most of the better big data stories, such as the health care company that’s collecting ubiquitous fertility data, instead of just from those having fertility problems (and skewing it for everyone), are specific to a domain or industry. In other words, they’re custom-built and designed. Big data is often more about the democratization of data as Bradford Cross once put it, to be liberated for use inside and across the business, instead of limited to data scientists in white lab coats, tackling a small number of well-defined problems. To do this, we need better ways to adapt big data appliances to the details of our business. Important initial headway is being made here (example: Appistry‘s industry-specific big data solutions) and I think we’ll see much more of it this year, especially in social business fields such social marketing, Social CRM, social product development, and crisis management as well as specific domains such as life sciences, defense, and especially financial services.
- Consumerization of big data will be one of the primary vectors into the organization for tactical needs, making ‘shadow’ big data a nascent but important new trend. I’ve made the argument that Google search is a great example of a simple big data appliance anyone can use. It analyzes the contents of most of the world’s Web sites in near real-time, allows all of it to be quickly searched using a simple interface, provides recommendations when it thinks you’re asking the wrong question, and so on. It’s in use at virtually every company in the world. It will be followed up by numerous SaaS big data services over the next few years that will bring consumer-like simplicity and power to the field. They will be so easy to start using that many workers will prefer them to any home grown solution. While this won’t always be the case (partially because the internal data is typically quite difficult to load into external services by the average worker), companies will see plenty of unsanctioned big data solutions. Not that I think this is a real problem.
- The more bureaucratic the company, the more it will struggle to embody its strategy and policy in an operational big data life cycle, despite this being the best way to obtain value. It’s hard for rigid processes and hierarchies to change, and companies either poor at using technology to solve problems or those that aren’t very agile will have more of an uphill challenge to activating on big data. I don’t expect big data to appear in these organizations in 2012, but early adopters will appear in technology, finance, healthcare, insurance, government, media, and retail businesses if they have either stiff competition or are already rapidly growing (and hence already changing) more than other businesses. Correspondingly, big data vendors that supply these industries will experience the most lift.
- Rich data, such as audio, images, and video, will remain opaque to most organizations this year, despite advances in machine analysis of both and their growing prevalence. I’m basing this on looking at most big data offerings today, which don’t emphasize these types of data very much if at al, despite the explosion of images, audio, and high-definition video in recent years. For now, the lion’s share of big data will focus on textual processing.
- On the other hand, social media and big data — because it will usually not be opaque — will have significant lift this year, though semantic processing will remain in its early stages. I recently identified nine top uses cases for social business intelligence, for which big data will be a leading solution. Social media, because it requires linguistic and natural language processing, is an ideal candidate for big data analytics yet it will be in areas requiring sophisticated link analysis like automated reputation/ influencer tracking and segmentation that will be of primary interest this year. Perhaps more importantly, big data will “complete” social business as a capability that allows the company to listen and intelligently analyze their constituents contextually and in scale (see figure above.)
- Despite all of this, companies will do surprisingly well integrating early big data capabilities in their social business efforts in particular. That the big software vendors are moving into the big data fray (such as with Oracle’s new big data appliance) speaks volumes. And offerings are focusing on the ease-of-use factor, both in installation and maintenance as well as operation, clearly understanding the immense competition and pressure they’ll be experiencing from online providers. I’ll be publishing a breakdown of the consumer big data app soon, but I’m virtually certain it will show a list of well-known and up-and-coming household names that you’ll be seeing in the enterprise this year as companies throw big data solutions at their issues to see what sticks.
Of course, many other interesting things will happen in big data as well but this is a good start. I’m hoping we’ll see how the real strengths and weaknesses of big data fall out over the coming 12 months. Please leave your comments and own predictions below in comments.