ARCHITECHT Daily: Data science isn't dead

I know, I know. Of course data science isn't dead, and only an idiot would say it is. However ... it
ARCHITECHT
ARCHITECHT Daily: Data science isn't dead
By ARCHITECHT • Issue #84
I know, I know. Of course data science isn’t dead, and only an idiot would say it is. 
However … it does seem like data science has lost a little bit of its buzz over the past couple of years, as deep learning and AI became the shiny new things everyone obsesses over. Now people talk about how to get into AI instead of what skills a data scientist needs. Analysts and the Harvard Business Review talk about the the shortage of AI experts instead of data scientists.
But while AI is attracting all the attention, data scientists are keeping at it somewhat out of the spotlight. They’re certainly incorporating AI models at times, but it turns out that tried and true tools like SQL and visualization still have some utility left in them.
Here are three posts that highlight the continued importance of data science in a world dominated by talk about AI:
Airbnb is running its own internal university to teach data science (TechCrunch): Airbnb credits the classes with boosting weekly active usage of data tools from 30 percent of the company to 45 percent. It seems reasonable that people will do this, and learn from it, if it’s part of the job and not an extra-curricular activity.
Designing a faster, simpler workflow to build and share analytical insights (New York Times Open blog): The data analytics team at the New York Times shares details on moving to a combination of Google BigQuery and Chartio. Among the big wins were being able to automate queries and centralize around standard tooling in a previously ad hoc analytics environment.
How data science helps power worldwide delivery of Netflix content (Netflix Tech Blog): Netflix is a pretty well-known data science firm, and even has been an early adopter of deep learning for fairly novel tasks such as dynamically optimizing pictures to meet available bandwidth. This post explains how the company uses data science to predict the popularity of content and ensure its CDN is caching the right content at the right place at the right time.

Sponsor: Cloudera
Listen to the latest ARCHITECHT Show podcast
Merlon Intelligence founder and CEO Bradford Cross talks about his new startup, which uses artificial intelligence to help banks root out money laundering and other regulatory issues. Cross, who’s also a partner at Data Collective Venture Capital, an experienced machine learning entrepreneur (with media-curation startup Prismatic), and a hedge fund veteran also discusses the unnecessary proliferation of bots, the real opportunities for AI startups, and why cloud providers might be chasing a dead end with general-purpose AI services.
Artificial intelligence
This is pretty much going to be the norm now. It’s what Merlon (above) is doing, more or less, and everyone from PayPal to large banks seems to be incorporating AI for anti-fraud in various ways.
This is a good writeup about the legal battle between Uber and Waymo, but also the disparate state of that industry. It’s expensive, and the sensors and the AI models aren’t interchangeable so everyone is working in a vacuum.
I’ll be honest, I still find myself manually typing a lot of words on my Android phone. Although it has learned my proclivity for swearing.
Deep Voice 2 can mimic hundreds of voices. But it will never replace Billy West!
This is a comforting line from the abstract of this paper: “Researchers believe there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 years.” 
arxiv.org  •  Share
This is probably beyond the scope of most hackers at the moment, but you know it’s only a matter of time before someone tries. Good to get the white hats on it early.
arxiv.org  •  Share
This is about improving performance, but I think there’s an argument for interoperability (probably not the right word here) of models, as well. Like perhaps there’s a baseline knowledge that all AI systems should have and share about what’s what.
arxiv.org  •  Share
It’s essentially a two-model system where one is producing pseudo-data to keep the other learning. The goal is to have AI systems that can perform multiple tasks without suffering “catastrophic forgetting.”
arxiv.org  •  Share
Sponsor: DigitalOcean
Cloud and infrastructure
I don’t think is a surprise to anybody. Curiously, though, SMBs (250-999 people) are the only ones bucking the trend. Is there a certain size of business that’s a dead zone for the cloud for some reason (too big/small, too little budget)?
This could have something to do with that story yesterday about Microsoft claiming it’s the only “legal” U.S. cloud in China. There’s value in ensuring customers there won’t be trouble.
Because of course it is. Can it run some to my house, as well?
Storage is not something I care too much about (not that I’m suggesting it’s not important), but Pure’s steady growth in the the cloud-first era stood out to me. What’s the limit on storage revenues going forward?
Switch is growing its data center footprint globally after starting with its famous SuperNAP in Las Vegas. It counts some big-name customers and government agencies among its customers; let’s assume some of Atlanta’s household names will be housed there.
This post does a reasonable job explaining the various options for cloud-native platforms (although it doesn’t exactly nail the Mesos part), but also sets up a strawman. No one is claiming it’s an easy move from one platform to another (except, of course, for the containers), but any platform will ease the move from one cloud to another.
There are some fair points in this post, but I don’t think the cat is going back into the bag here anytime soon. Too many experienced people are committed and don’t want to go back to the way things were.
aadrake.com  •  Share
There are some interesting quotes from Tanium’s CEO in this piece about the cybersecurity company’s $100 investment, and the repercussions of recent allegations about how employees are treated. 
fortune.com  •  Share
Media partner: GeekWire
All things data
I don’t know a lot about DataRobot (other than that it’s an applied machine learning company), but I covered Nutonian a few years ago and they seemed to be a smart bunch with solid tech. The rationale appears to be applying Nutonian’s time-series predictions to IoT data.
fortune.com  •  Share
Instacart has been pretty open lately about what it’s working on, even sharing some of its datasets on shopping habits. Here’s a podcast where its VP of data science shares more.
Or so says a Cloudera engineer, citing reasons from security to bugginess when running Docker containers directly on YARN. Not sure how long that will be a big concern in a world of competitive schedulers anyhow.
Listen the the ARCHITECHT Show podcast. New episodes every Thursday!
Did you enjoy this issue?
ARCHITECHT
The most interesting news, analysis, blog posts and research in cloud computing, artificial intelligence and software engineering. Delivered daily to your inbox. Curated by Derrick Harris. Check out the Architecht site at https://architecht.io
Carefully curated by ARCHITECHT with Revue. If you were forwarded this newsletter and you like it, you can subscribe here. If you don't want these updates anymore, please unsubscribe here.