First things first

I wasn't even planning to send an issue today, because travel, but then I arrived at my hotel and saw the news of the Cloudera-Hortonworks $5.2 billion merger. And I felt compelled to write something, especially considering how closely I used to cover these two companies and the Hadoop space overall. So, here's my brief and disjointed take, as well as some links.

There were a lot of news stories written on the news that Cloudera and Hortonworks are planning to merge, and you can probably get generally the same details from reading any of them. But for background, I'll point you to articles by my former colleagues Jordan Novet and Tom Krazit:

And, for good measure, you can also check out the Cloudera blog post, Hacker News discussion, and this short but sweet "big data obituary" from Gartner.

I think anyone not directly involved with this deal who says they aren't surprised is lying. Sure, a merger like this actually makes quite a bit of sense when you think about it, but these are two companies that spent years as mortal enemies. (A colleague reminded me of this piece I wrote on their rivalry back in 2011.) The idea that they would unite as a single company is definitely surprising -- and would have been unthinkable even a couple years ago.

But it does make sense. The Hadoop market never panned out like so many people thought it would, which left the companies in it (1) running away from the "Hadoop vendor" label and (2) searching for business lines to validate all their investment in that technology stack. Cloudera seemed to target the data warehouse and data science/machine learning side of things, while I think Hortonworks was doing some interesting things around the internet of things and edge computing. Now they can work on bringing all this stuff together into an entity that stands a better chance of surviving -- and even thriving -- in an IT industry that's much different than when Hadoop hit the scene a decade ago.

The world probably didn't need two companies each working off a similar base, but also supporting their own technologies (open source or not) around security, storage engines, analytics, governance and the like. Apparently, the powers that be at Cloudera and Hortonworks were wise enough (and mature enough) to see this and do the right thing. It will be very interesting to see how they bring their various technologies together, a project they acknowledge will take a few years.

Business-wise, the issue isn't so much revenue (both companies have been seeing nice gains) as it is losses (both have still been losing money every quarter).

People are correctly asking what this merger means for MapR -- the third of the original Hadoop platform companies -- and I honestly don't know. MapR was always ahead of the curve in terms of pushing the open source business model and innovating around file systems, databases, containers, etc., but you have to imagine it's suffering from the same factors that forced the merger of its two bigger, publicly traded rivals. There used to be talk of a MapR IPO; I haven't enough of that company recently to get any sense of whether that's still a real possibility.

What are the factors that made the Hadoop/big data/whatever market so difficult in the end? Here are a handful, but they're all related:

  • Artificial intelligence / machine learning
  • Cloud computing (including storage, managed services and open source activity)
  • Spark
  • Kubernetes
  • Other open source projects (probably including, but not limited to Elastic, Kafka, Flink and any number of databases)

Basically, much the world moved on from heavy data-infrastructure projects and wanted to do things faster, easier and cheaper. The Hadoop ecosystem was an able flag-bearer for big data in its early days, but other projects and whole new industries (AI, for example) were able to evolve on their own outside of Hadoop, and then start integrating with one another because that's how open source works.

The result are whole new data architectures, application architectures, development processes and user expectations, most of which Cloudera and Hortonworks weren't really in any position to influence. They have adapted and integrated where necessary (Spark, Kubernetes, TensorFlow, etc.), but it seems like a perpetual game of catching up to massive cloud providers on one hand and fast-moving open source communities on the other.

There's probably also an angle here about the amount of capital these companies raised, but I'm not going to dive into it. Except to note that there are also a bunch of database companies (including in the NoSQL space) that have raised significant funding over the past several years and are possibly struggling to find a satisfactory exit. It wouldn't surprise me to see the new Cloudera/Hortonworks do an acquisition or two here to flesh out the full data-management story, or even to see companies in the database space do their own mergers in order to better compete.

Finally, these two ARCHITECHT Show podcast episodes seem relevant today:

And if you're feeling suddenly interested in the whole Hadoop/big data/whatever-you-want-to-call-it space, I would just peruse the podcast catalog here or here. There are a bunch of interviews with the folks behind Spark, Kafka, Elastic, Flink, Neo4j, Timescale and more, plus lots of discussions around Kubernetes, AI, open source and more that shed some light on the greater world in which "big data" now must operate.

Read and share this issue online here.