Last week I attended Strata, a conference organized by O’ Reilly and devoted to big data. I was a large conference (790 attendees) whose content included both technical talks and tutorials about the new generation of big data tools, e.g., Hadoop, Cassandra, visualization, as well presentations on big data business applications. The diversity and size of the audience and the reported business successes provided a strong indication of how important and popular the area of big data has become.
Big data is pervasive in many of the companies Trident has funded the last few years. We have invested in companies that generate and/or process big data, e.g., eXelate, Extole, HomeAway, Sojern, Turn, Xata, as well as companies that provide platforms for storing, managing and analyzing big data, .e.g., Acteea, Host Analytics, Pivotlink. We recognize that many of the companies we invest in the future will need to have competence in big data.
There is a big difference between big data and data warehousing stemming primarily from the nature of the data. Data warehousing was all about analyzing transactional data that was captured from enterprise applications such an ERP or POS system. In addition to the actual transactions, big data is about capturing, storing, managing and analyzing data about the behavior of transactions, i.e., what happens before and after a transaction. This has several implications. First it means that the captured data is less structured. It is easier to analyze a collection of purchasing transactions in order to try to identify a pattern, instead of analyzing a series of selections made across of set of web pages to establish a pattern of behavior. Second it implies that meaning must be extracted from events, e.g., the browsing activity prior to buying an item. To be effective in this more open-ended exploratory data analysis one has to break through the data silos that are typically found in enterprises and bring all available data to bear. It also means that one must be collecting all available data rather than trying to decide a priori which data to collect and keep.
Data science is becoming a field. Big data is eliminating the segregation between the people who manage the data, the people who analyze the data, and the people present/visualize the data. A good data scientist must be able to do all three, though, as I wrote last week, translating business requirements to a data problem and the resulting insights to business actions and value remain largely missing skills in data scientists. Good data scientists are in high demand, as indicated by the jobs being advertised at the conference and as reported at the conference by LinkedIn. They are expected to play a significant role on how their companies evolve. That’s not something we were used to hearing about data analysts who were always considered fixtures of the back office. I know because I started my career in data analysis.
Corporations have a lot to learn about big data from consumer-oriented companies that generate, manage and analyze big data, e.g., Amazon, eBay, Facebook, Twitter, and LinkedIn to name a few. This is a reversal of sorts. In the mid 90s when I was with IBM I was running an organization that was devoted to building data warehouses and providing analytical tools and services to Global 1000 companies. At that time various companies, including many of the then nascent Internet companies, were trying to learn from the data warehousing and business intelligence practices of Walmart, Citibank, and First Data. Today such companies will do well by trying to understand and apply the big data techniques being developed by many internet and social media companies. One big difference is how such companies approach data stores. Traditional businesses see the enterprise data warehouse as storing the “single version of truth” about the data. Big data stores are viewed as containing multiple perspectives. Their contents must be analyzed with the right set of tools in order to gain a perspective about the problem at hand.
Talking to the conference’s attendees I got the impression that more companies than ever before are starting to view data as an invaluable asset and a potential key to their success. They are no longer intimidated by data volumes and are using the new generation of big data management and analysis tools to bring more data under their control.
Strata was a great conference that brought under one roof the leaders in big data thinking, and doing. It also showed that, though increasingly important, this is still a small community and in many respects its overall size has not changed since the time I was one of the analysts. We all need to find ways to accelerate the education and introduction to market of new data scientists. The ability of many companies to continuously innovate, become leaders, and remain in this position could largely depend on their ability to recruit data scientists who can effectively exploit their big data assets.