Cloudera‘s continuing focus on the implications of explosive data growth has led it to another key partnership, this time with Informatica. Connecting to the dominant player in data integration and data quality expands the opportunity for Cloudera dramatically; it enables the de facto commercial Hadoop leader to find new ways to empower the “silent majority” of data. The majority of data is outside; not just outside enterprise data warehouses, but outside RDBMS instances entirely. Why? Because it doesn’t need all the management features database management software provides – it doesn’t get updated regularly, for example. In fact, it may not be used very often at all, though it does need to be persisted for a variety of reasons. I recently mentioned Cloudera’s success of late; it’s going to be challenged by some big players in 2011, notably IBM, whose recent focus on Hadoop has been remarkably nimble. So these deals matter. A lot. The Data Management function is being refactored before our eyes; both these vendors will play in its future.
Informatica has been on a roll, upping its Data Quality game, acquiring and integrating Siperian for Master Data Management (MDM), 29West for streaming data at low latency, and, with Informatica Cloud 9, delivering a multi-tenant platform-as-a-service for data integration, as well as Amazon EC2 support. Informatica’s Q3 2010 results continued a record of 30 consecutive quarters of year-over-year growth. Total revenues grew 31% to $161 million; new license revenues were up 40% to $70 million. Analyst reports continue to acknowledge the success of its strategy to unify data integration-related product categories into a comprehensive set of offerings.
Informatica’s opportunity is huge here: bringing the kinds of tools it provides for structured data to the “extreme data” world is the next step for the pioneering user companies. The huge volumes of new information coming from web consumer applications, newly instrumented physical assets, mobile devices and more are typically being processed with new tools and new languages, outside conventional enterprise methods and teams. They contain priceless opportunities for new combinatorial approaches that will deliver new insight, new processes, and entirely new businesses. As one example, consider the kind of intensely computational scoring of customer data that firms are moving into massively parallel farms running Mapreduce over HDFS. Now think about how the profile of an (anonymous but describable) customer in Hadoop could be related to a self-described customer in CRM to enable real-time cross-selling and you see the place the two worlds overlap.
How to build the workflow needed to put this together? One way would be to take mappings designed in Informatica, convert them into MR and user defined functions (UDFs), and execute them on Hadoop. That’s one of the deliverables the two companies promise out of their partnership. The connections will be made via a scalable connector to HDFS, leveraging Sqoop in a fashion not unlike the one I described in covering the Cloudera-Membase deal, or the work Cloudera has done with Vertica, Teradata and others.
Informatica brings a far broader value proposition to the table beyond task execution. Data governance, metadata management, and administration for “outside” data are generally absent in most of the shops that are using that data more and more. Enabling reuse, providing manageability and reliability will become more important as the uses of outside data proliferate. Informatica has already demonstrated it can provide huge value to the enterprise data management function without actually being a database player. Perhaps it should consider whether partnership or ownership is the best model here.
Disclosures: neither Cloudera nor Informatica is a client of IT Market Strategy