When Salesforce.com and Oracle announced a renewal of their vows last year I was kind of surprised. Why would Salesforce deepen its reliance on the Oracle database at a time when others across the industry were lessening their reliance on that venerable and not inexpensive platform.
To be fair Oracle database has excellent high availability features, and decades of query optimisation built in. It’s a rock solid database for traditional relational workloads.
But pretty much every business has a data challenge that can’t be easily met with traditional database licensing driving the costs up – for example call data records in telecoms. Oracle’s problem, and that of its customers, isn’t scale so much as cost and inflexibility. What is a system of record if you have to throw away most of your logs in order to keep costs down? I am using Oracle as short hand here- IBM DB2 and even the once low cost alternative Microsoft SQLServer were all designed for an era where software licenses were king. But this is a different era – whether or not you love or hate the term Web Scale, it captures something of the change.
Open source data stores are opening new opportunities for businesses to solve those data management problems they had previously parked, and creating entirely new businesses. Even the Wall Street Journal is writing about data as currency these days. We’re seeing enterprise sales across the board from open source platforms like Amazon Red Shift and SimpleDB, Cloudera, Couchbase, Hortonworks, MapR, MongoDB, Datastax, Data Bricks.
So if everyone else is zigging, why not Salesforce, given its competitive relationship with Oracle? Every customer dollar spent on Oracle is a dollar less on R&D or customer facing operations. Given that Salesforce has a really solid Postgres competence at its Heroku subsidiary that seemed a natural place to invest.
So what would a system of record for every event look like, without requiring a pre specified relational style data model? Increasingly the answer to that question is found in the wider Hadoop ecosystem. The Big Bucket of Bits is Hadoop Distributed File System (HDFS), with a number of technologies designed to take advantage of that data pool. See for example Spark, the new hotness is reading and processing data from Hadoop data sources. Pivotal meanwhile is putting its weight behind Tachyon for “data lakes”.
So let’s get back to Salesforce then, shall we?
Today Force.com is a system of record, somewhat constrained by the cost of relational database. Going forward however, Salesforce will increasingly offload storage to a Hadoop store, with SQL query support based on Apache Phoenix, a layer on top of Apache HBase for data that is not based its traditional business object model. A good example is a logging service for compliance, which will take advantage of Phoenix (Salesforce are core committers and project leads). In other words Oh Hai Apache.
Logging and compliance are not an accidental use case- Splunk has created a billion dollar company based on this idea, and needless to say it doesn’t run on Oracle.
Salesforce is not going to replace Oracle at the core any time soon, but it is going to use Hbase and Phoenix at the edges as a pluggable architecture to offer customers. The blob storage will make particular sense for read only data.
Salesforce needs a Big Data play for its customers and ISVs, or they’ll simply go elsewhere, particularly given the prevalence of, and innovation in, data stores, and the ability to spin up IaaS to run them so easily.
It will be interesting to see if the new architecture gets a mention today when Salesforce rolls out its new analytics architecture, codenamed Wave.
Replacing Oracle is hard, but augmenting is not. We can expect to see similar patterns in the enterprise. Just as companies still run mainframes today, Oracle database isn’t going away any time soon. But its period of outright dominance, as the status quo, is now over.
it isn’t saying so explicitly, but for Salesforce Oracle is a legacy technology. Managed decline is the order of the day. As with the mainframe, Oracle capacity will grow, but distributed data is going elsewhere.
(Cross-posted @ James Governor’s Monkchips)