ArchiTECHt Daily: Facebook exposes the limits of Hadoop (again)

This is a an edited excerpt of a post that originally appeared on the website on Saturday.On Friday,
ARCHITECHT
ArchiTECHt Daily: Facebook exposes the limits of Hadoop (again)
By ARCHITECHT • Issue #11
This is a an edited excerpt of a post that originally appeared on the website on Saturday.
On Friday, Facebook published a blog post detailing a new open source system called Beringei. It’s a time-series storage engine that the company uses to serve real-time data about system health to both the humans and automated systems tasked with keeping Facebook online. Because of the scale and real-time nature of Facebook’s operations—"Beringei currently stores up to 10 billion unique time series and serves 18 million queries per minute"—the company had to replace its previous HBase daa store for this workload.
Beringei signifies yet another step in the evolution of how the company is using Hadoop. And where Facebook goes, the industry tends to follow.
In part, this is because when Facebook starts outgrowing a technology, it’s a pretty good indicator that other large users of those technologies might soon start running into similar issues. For example, Facebook created the Hive data warehouse system for running SQL queries on Hadoop data, only to replace it with a much faster system called Presto in 2013.
Since then, Presto has amassed an impressive list of users. What’s more, Hadoop vendors Cloudera, Hortonworks and MapR—plus any number of startups, and even users such as Salesforce—have developed their own low-latency Hive alternatives.
Recently (and somewhat ironically), Facebook jumped onto the Apache Spark bandwagon in big way, after putting Spark through its paces to ensure it could handle Facebook’s giant batch-processing workloads. Spark was created, and became hugely popular, as a simpler, faster alternative to Hadoop MapReduce.
But Beringei is different from previous Facebook creations such as Hive, Presto or Corona because it doesn’t require Hadoop at all. While it’s a narrow-enough use case that it alone won’t likely have a material effect on HBase usage, or certainly on the overall market for Hadoop software, Beringei might play a small role in a future scenario where Hadoop just isn’t on the radar for many companies.
Already, startups not yet dealing with big data (at least by today’s definition), can likely afford to bypass Hadoop and its relative complexity altogether, opting instead to build around more modern and right-scale technologies from the start. Beringei and Spark are two examples among a sea of open source databases, data stores and real-time processing engines now available.
And as the pool of engineers leave companies like Facebook and other early adopters, to start their own companies or join others, we might expect them to become software Johnny Appleseeds spreading the seed of these technologies and practices across new lands. You hear frequently about ex-Google engineers missing its famous Borg system once they leave, for example, or about engineers having to rebuilt the same thing over and over as they move from job to job. But the beautiful thing about the open source era is that it’s much easier now to take your favorite tools with you.

Source: Facebook
Source: Facebook
What's new on ArchiTECHt
The Siebel Systems founder discusses his new company, C3 IoT, and how cloud computing and big data are making making industrial-scale IoT a possibility. Continue reading on ArchiTECHt »
Around the web: Artificial intelligence
Google says it has developed a method for labeling video footage, which will help AI models identify objects that are moving and moving across the screen. I would expect to read more about this soon. 
arxiv.org  •  Share
If you live in one of the 25 cities where Google Maps for Android now predicts parking availability for your destination, this blog post explains how they built that model.
From roboticist Rodney Brooks, a history of Moore’s Law and a prediction that new architectures will spur advancement in machine learning, quantum computing and more. 
That poker-playing robot in the news is a prime example of where AI research is heading. The field takes on a whole new level of complexity when you move beyond classification.
Speaking of new directions in AI, tech media site Quartz built an Atari-playing system using the OpenAI Universe package. You can watch it learning here.
qz.com  •  Share
This isn’t an artificial intelligence-based challenge, but it’s interesting, timely, and I assume a deep learning model will win it.
Around the web: Cloud and infrastructure
Researchers have created a system called ArchiveSpark that uses Apache Spark to make it faster and easier for researchers to utilize the immense amount of content on archive.org and other archiving sites.
arxiv.org  •  Share
A nice description of the Lambda-like, or functions-as-a-service, project backed by IBM and Adobe. It’s actually a fairly complex system in terms of components; hopefully deployment/management is simplified.
This is a good look at some of Microsoft’s recent decisions around re-architecting its backend systems and, more specifically, building out a scalable Git source control system.
Amazon Web Services gets dinged for not being open, but it’s the only cloud provider here with straightforward revenue numbers, from what I can tell. And the biggest.
Business implications aside, it seems like satellites will some day be part of our vaunted edge networks. Also, IIRC, Skybox was doing some interesting things with big data for image processing before the Google acquisition.
Around the web: Security
… in the right circumstances, of course. A bittersweet story from a DARPA contest that shows how well experts and models can work together. We just have to hope the good guys have better AI; they rarely have more motivation.
What is the level of accuracy and anonymity (and performance) where we would trust AI to do real-time deep-packet inspection, for example, and try to really take a bite out of consumer malware?
Researchers from the University of Ottawa devised a method of hacking quantum data transmissions by cloning the photons carrying the qubits. On the bright side, they also identified some telltale signs of quantum hacking.
Did you enjoy this issue?
ARCHITECHT
The most interesting news, analysis, blog posts and research in cloud computing, artificial intelligence and software engineering. Delivered daily to your inbox. Curated by Derrick Harris. Check out the Architecht site at https://architecht.io
Carefully curated by ARCHITECHT with Revue. If you were forwarded this newsletter and you like it, you can subscribe here. If you don't want these updates anymore, please unsubscribe here.