Learning from DeepMind's data-privacy woes

I assume readers in the U.S. are getting ready to check out and get a jump on Independence Day, so I'
ARCHITECHT
Learning from DeepMind's data-privacy woes
By ARCHITECHT • Issue #106
I assume readers in the U.S. are getting ready to check out and get a jump on Independence Day, so I’m sending this early and keeping it relatively lightweight.
But you’ll definitely want to pay attention to the latest in the continuing saga of Google DeepMind and the UK government’s consumer-data watchdog, the Information Commissioner’s Office. The big news on Monday is that the ICO confirmed that DeepMind and the UK National Health Service did indeed violate data-privacy laws with an agreement they entered into in 2015. 
I wrote about this investigation back in March, and you can get more details in that post, as well as here:
The long story made short here is that DeepMind entered into agreements to access patient data from hospitals in an effort to train a model that can predict kidney problems and send message to doctors and nurses in real time. The government found that the hospitals gave more data to DeepMind than was necessary, and that DeepMind used the data in ways it hadn’t disclosed in the agreements.
However, you’ll also want to read DeepMind’s mea culpa—and proclamation of the project’s success—on this particular issue: The Information Commissioner, the Royal Free, and what we’ve learned. What DeepMind learned is that it vastly overlooked the complexities of and rationales for consumer data-privacy laws, but also that its Streams technology is working very well:
“We’re proud that, within a few weeks of Streams being deployed at the Royal Free, nurses said that it was saving them up to two hours each day, and we’ve already heard examples of patients with serious conditions being seen more quickly thanks to the instant alerts. Because Streams is designed to be ready for more advanced technology in the future, including AI-powered clinical alerts, we hope that it will help bring even more benefits to patients and clinicians in time.”
The ICO also issued a statement, via blog post, explaining essentially that privacy regulations are both necessary and are not inherently an impediment to innovation: Four lessons NHS Trusts can learn from the Royal Free case.
There’s still a lot of work to be done on the issue of data privacy and AI, in the U.S., U.K. and, I’ll assume around the world. On the on hand, AI is remarkably good at discerning patterns from medical data and medical images, and more data is often better. On the other hand, patients need to know that hospitals aren’t giving away their data to third parties without consent, and potentially opening it up acquisition by nefarious parties.
Matei Zaharia, one of the creators of Apache Spark and co-founders of Databricks, addressed this issue in a recent ARCHITECHT Show podcast interview, as well. He’s working on the DAWN project at Stanford, which counts among its aims the simplification of processing training data for AI models, especially in fields like medicine or other industries where data is hard to come by and even harder (or more expensive) to classify: 
“But what if you’re trying to parse medical records and automatically create diagnoses? You can’t put people’s medical records on the Internet for random people to label. … Even if you could, you need doctors to label them. So … it costs $200 per hour or more to do it.”
He added later in the interview:
“[I]f you look at software coming out of Google, for example, it’s really optimized for a Google-style company. They use that internally. That is a company where the applications are all on the web, so there’s a ton of data. …

”… If it gets great results, that’s great, but it doesn’t mean that it’s actually the right software for, say, a medical researcher at the Stanford Hospital who doesn’t have 20,000 PhDs and doesn’t have millions of labeled images. It’s actually a completely different problem for them. “
Projects like DAWN are trying to improve this situation by streamlining the classification process. AI researchers are working on reducing the amount of data their models need to learn accurately, and also to produce simulated data that can help train models where there’s a dearth of actual data. 
But this is a problem that’s not going away anytime soon, at least as far as companies are involved. There’s a disconnect between laws, financial incentives and the types of companies/organization collecting consumer data, as well as what they’re collecting. For every hospital, for example, there’s a health-care-app provider or other tech company that would love access to its data, and vice versa. And, as DeepMind points out, the benefits to patients and hospital staff can be meaningful—even if the someone had to bend some rules in order to get it done.
Smart people have been calling for some action on issues like this for ages—rules balancing privacy, innovation, anonymity and security—but the dawn of the artificial intelligence era seems like the right time to actually accomplish something.

Sponsor: Cloudera
Sponsor: Cloudera
Listen to the latest ARCHITECHT AI podcast
In this episode of the ARCHITECHT AI and Robot Show, Derrick Harris speaks with Demisto co-founder Rishi Bhargava about the state of cybersecurity, and how machine learning can help bring order to the chaos. Bhargava discusses current threat vectors and the shortcomings of many response tactics, and explains how Demisto’s technology uses ChatOps to analyze response behavior, suggest courses of action, and give security personnel a single point of interaction across dozens of tools.
Sponsor: Linux Foundation
Sponsor: Linux Foundation
Artificial intelligence
This sounds a little like some of IBM’s early claims with Watson, only focused on clinical trials rather than on research literature. This is also one of those situations referenced above, where the more data each party has, the better it might work out.
This is a thoughtful take on the differences between building AI applications and other applications, down to questions like how you effectively debug the system.
This is a good presentation by Nick Bostrom, who coined the term (I think) “superintelligence.” He points out the uncertainty in predictions about when it will occur, highlights some issues, and says he’s less of an AI-doomsayer than his research would have you believe.
We’ve seen a lot of work to overcome the black-box problem in AI, including the research highlights in this news story. Understanding why decisions are made will be critical as AI enters regulated arenas.
The company is set to announce some big partnerships and new software this week. Don’t bet against Baidu to make some serious waves in this space, especially overseas.
Whether or not the technologies make their way from warehouses to the roads, it’s hard to argue with the logic that warehouse robots can teach us a lot about the interactions between humans and autonomous machines.
This is mostly a business story and not an AI one, but it’s conceivable that a proliferation of smart-home assistants (including the Apple HomePod) will teach us what consumers actually value. 
fortune.com  •  Share
I linked to a blog post about this a couple months ago, and here’s another take on that topic. I would say it the practice seems feasible in several years’ time, but not particularly worrisome on a grand scale.
There’s so much work happening in computer vision that it’s hard to keep up—including for this very application—but I always take notice when Disney is involved. Because, well, it is Disney. 
This isn’t technically about AI, but it’s applicable to anyone working in a text or speech recognition. If you’re taking a product globally, make sure it understands and is equipped for different cultures.
DeepMind really brought reinforcement learning to the fore with its video-game-playing systems, and this new research aims to make it even better by giving systems a new way to explore their environments.
arxiv.org  •  Share
It might sound counter-intuitive but, in fact, AI systems aren’t very good at learning from things they’ve already learned. This is a problem getting lots of attention right now, including from this team at Carnegie Mellon.
arxiv.org  •  Share
Sponsor: DigitalOcean
Sponsor: DigitalOcean
Cloud and infrastructure
Other reports say thousands of layoffs are coming, but Mary Jo Foley at ZDNet suggests the number might be much less. At any rate, they’re part of the company’s ongoing shift to selling cloud services and tech.
This blog post from Amazon CTO Werner Vogels explains why the new integration between Amazon Lex (voice API) and AWS Connect (call center service) is so powerful. But the more I think about Connect and its implications, the more I think we’ll look back on this era like we did when AWS moved from books to computing. 
This is from a publication targeting government agencies, but it applies to anyone adopting open source software. Community, vendors and your own contributions will matter a lot.
gcn.com  •  Share
Blockchain is obviously one of the biggest things happening in tech right now, but this article suggests the tech will look a lot difference a decade down the road when it really starts going mainstream.
This podcast address what I think are some well-understood concepts today, at least at a high level. Namely that quantum computing will need to embrace the cloud to thrive, and knowing that from the beginning is a good thing for the companies involved.
Sponsor: CircleCI
Sponsor: CircleCI
All things data
I am linking to lots of stories about Alibaba lately, including this one on how it sees data science as a strategic advantage. I don’t think we’re close to seeing how big that company will get.
I linked last week to a blog post about how Segment is using this in its data-integration service, and then I kept seeing it pop up in other places. So here’s the info, and why it matters, straight from Confluent.
There are some fair points here about why it’s too early to take for granted that we’re going to see heavy analytic workloads running on edge devices. I think there will certainly be some, but whether they happen on devices or in edge data centers (before being sent to the wider cloud) is the bigger question.
Completely by coincidence, I first heard of Astronomer last week, and then I saw this blog post about how it uses DC/OS—a tech I know very well. This is a fair assessment of its pros and cons, especially as the foundation for a data-centric platform.
Sponsor: Bonsai
Sponsor: Bonsai
Did you enjoy this issue?
ARCHITECHT
The most interesting news, analysis, blog posts and research in cloud computing, artificial intelligence and software engineering. Delivered daily to your inbox. Curated by Derrick Harris. Check out the Architecht site at https://architecht.io
Carefully curated by ARCHITECHT with Revue. If you were forwarded this newsletter and you like it, you can subscribe here. If you don't want these updates anymore, please unsubscribe here.