ARCHITECHT Daily: Facebook shows why data, not AI, is the thing that really matters

Facebook published a blog post on Monday highlighting how it's able to analyze, classify and serve up
ARCHITECHT
ARCHITECHT Daily: Facebook shows why data, not AI, is the thing that really matters
By ARCHITECHT • Issue #80
Facebook published a blog post on Monday highlighting how it’s able to analyze, classify and serve up its users’ billions of photos via a system called Photo Search. On the surface, the post highlights Facebook’s prowess across several big data domains, including deep learning, graph analysis and indexing. Being able to return relevant image results to a user’s query is hard enough, but doing so with acceptable latency requires some particularly clever thinking about how to design the system.
However, below the surface (or perhaps between the lines) lies the ultimate truth about Facebook’s artificial intelligence engineering: None of it would matter very much if Facebook didn’t have so much valuable data. While the sheer scale of its photo data might present some challenges, the real complexity occurs because Facebook wants to do more than just show a user all of her cat photos. It wants to show her friends’ cat photos, as well, and ads and businesses and whatever related content it can serve up. 
Our photos, connections, wall updates, likes, etc., etc., etc., all provide Facebook with valuable data about who we are, who we know and what we like. Its investments in AI, big data infrastructure, network engineering and everything else are essentially a means to get us to share more data (because, hey, it’s free and fast), and then to analyze it sell ads against it.
Think about trying to build a successful photo-search service—something worthy of VC investment, or probably even profitable enough to justify bootstrapping for a prolonged period—today. It would be a tall order. Even if you built the best photo-search experience ever, you’d still need to convince customers to upload their photos in yet another service, and then to actually pay for it. If you wanted to make money beyond subscriptions, you would need some other type of data against which you could sell ads, recommendations, or whatever else you aimed to productize.
The big problem, of course, is that Facebook, Apple, Google and Flickr already have all our photos and offer search for free or next to free. Facebook and Google, in particular, also have petabytes upon petabytes of other data they can use to target users with ads, recommendations, new products, you name it. The big data infrastructure, the deep learning models and everything else they do exist to serve the data, not the other way around.
However, as I wrote last week, Pinterest spotted an opportunity where there was still room to innovate, and amassed a seemingly perfect dataset on top of which to, eventually, apply computer vision models. Lots of images on which to train a model? Check. Clear signals about what users like and might want to buy? Check. The next natural step is to insert ads into results and reap the rewards.
AI presents an enormous opportunity to make money and change the world (if you’re into that sort of thing). AI technologies are also readily available today and will only become more commonplace in the years to come. But everyone can have those: Facebook and Google are literally giving away some of their technologies, in part because they already have such a big headstart with so many types of data that open sourcing some tooling doesn’t really matter much.
AI hasn’t changed the lessons of the past decade of big data, except for providing some powerful new means by which to process it and analyze it. The data is still the thing that matters most. 

Sponsor: Cloudera
Sponsor: Cloudera
Listen to the latest ARCHITECHT Show podcast
Las Vegas CIO on how cloud, AI and open data keep Sin City safe and savvy
In the latest episode of the ARCHITECHT Show, City of Las Vegas CIO Mike Sherwood shares some details on how he’s trying to make the city more secure, more innovative and more efficient.
Artificial intelligence
Among new CEO Jim Hackett’s stated responsibilities: “Modernizing Ford’s business, using new tools and techniques to unleash innovation, speed decision making and improve efficiency. This includes increasingly leveraging big data, artificial intelligence, advanced robotics, 3D printing and more.”
This is an interesting assessment of how Google and Microsoft approached AI at their respective developer conferences. However, vision is not the same as building products people use/want to use.
Google’s Fei-Fei Li says it’s the killer app for AI, but that’s probably overstating it. It’s easy to think of use cases for computer vision, but they’re not all game-changing in the way that simply crunching numbers might be.
Of course, computer vision is going to be important in many circumstances, and datasets like this will only aid in that. Researchers can train models on 400 actions and 400 clips per action. 
arxiv.org  •  Share
A picture is worth about about 4,000 words, in this case, as the author analyzes pics of Google’s Tensor Processing Unit cluster to figure out how it’s all put together.
One thing worth noting here is that neither Microsoft nor Google talk much about using GPUs internally (although they almost certainly do). Microsoft utilizes FPGAs and Google builds ASICs. You have to wonder how homogenous, or not, their processor footprints will be in a decade.
This gets to the point above about what types of processors will be running our AI workloads going forward. Neuromorphic chips, memristors, etc., aren’t yet commercially available. What happens when they are?
You might have noticed that CPUs often get overlooked in discussions about AI processing. This research is evidence that chip giant Intel, despite its FPGA and next-gen chip businesses, isn’t going to let CPUs be ignored without a fight.
arxiv.org  •  Share
This HBR article seems both overly optimistic and entirely possible, if businesses are willing to get creative and maybe sacrifice a little profit. It’s not just training people for tech jobs but, for example, turning a Walmart into a mall of sorts.
hbr.org  •  Share
This is one of those ideas that’s funny in premise but not so much in practice. I love the comments explaining how an actual AI would function compared with the author’s ideas.
Sponsor: DigitalOcean
Sponsor: DigitalOcean
Cloud and infrastructure
So many people are now trying to present at big AI conferences that cloud providers (save for AWS, it seems) didn’t have enough GPUs. I suspect that will change by next year. Will conferences like NIPS become the new Black Friday in the cloud?
Amazon CTO Werner Vogels discusses the thinking behind Amazon Aurora—"the fastest-growing service in the history of AWS"—and points to a new paper breaking it down in greater detail. 
Segment CTO Calvin French-Owen was on the ARCHITECHT Show podcast last month talking about how the company cut $1 million off its annual AWS bill. Here’s more details on how, exactly, they analyzed their environment.
segment.com  •  Share
I tend to agree with commenters that this a probably a non-story, but there is an important debate to be hand about support for outdated services, SLA terms and the like.
Basically, the company is trying to automate network configuration changes the same way new tools automate changes to application code, including making sure nothing (even compliance) is broke.
Here’s a deep dive into Facebook’s Telecom Infrastructure Project, which is looking to do telco gear what the OCP did for data center gear. If ad revenue ever starts slipping, Facebook has a future in infrastructure.
This is a good assessment of where OpenStack presently exists in a still-large market for on-prem software. It’s always worth remembering, though (as the author no doubt knows), that container-based platforms can deliver, and already are delivering, private clouds to some large companies.
If we keep having level-headed discussions about what serverless computing is and how to use it and think about it, it might be less revolutionary than initially predicted. Still really useful, though.
P—which was created by Microsoft Research, UC-Berkeley and Imperial College London—helps with eliminating bugs in asynchronous, event-driven applications.
Georgia Tech researchers say organizations could prevent certain types of cyberattacks by analyzing their network traffic in greater detail, because malware must communicate with its masters via the network.
There are probably a lot of reasons for this, not the least of which is the amount of legacy systems and software hanging around. A total federal infrastructure overhaul might be a good investment over the long run.
This is pretty deep into the world of shuffling algorithms for encrypted data, but any progress toward secure data (even from cloud providers) is positive progress.
arxiv.org  •  Share
Media partner: GeekWire
Media partner: GeekWire
All things data
This is hardly breaking news, but it’s worth keeping an eye on as AI makes algorithms even smarter, and banking jobs continue to evolve. The link is to a paywalled WSJ story; here are some highlights from Axios.
www.wsj.com  •  Share
It appears that if Uber thinks you might have more money, it will charge you more. This is the counter-example to the usual concerns about data-based discrimination, but no less (OK, maybe slightly less) troubling.
Research into DNA as a storage medium made a small splash a few months ago because it’s so dense—and so novel—but the general consensus was it’s a long way off commercially. Microsoft is really pushing the envelope here.
Kodiak Data’s new MemCloud service wouldn’t stand out much, except that it’s targeting big data workloads in a modern sort of way. It’s similar in theory (if not technically) to what Mesosphere does with DC/OS, but focused on big data workloads.
Color-coding (or otherwise indicating) the relative statistical significance of findings in data-analysis software is a good idea, but it will need to get into a product like Tableau, Qlik, etc., to really make a difference.
Listen the the ARCHITECHT Show podcast. New episodes every Thursday!
Listen the the ARCHITECHT Show podcast. New episodes every Thursday!
Did you enjoy this issue?
ARCHITECHT
The most interesting news, analysis, blog posts and research in cloud computing, artificial intelligence and software engineering. Delivered daily to your inbox. Curated by Derrick Harris. Check out the Architecht site at https://architecht.io
Carefully curated by ARCHITECHT with Revue. If you were forwarded this newsletter and you like it, you can subscribe here. If you don't want these updates anymore, please unsubscribe here.