Learning from DeepMind's data-privacy woes

I assume readers in the U.S. are getting ready to check out and get a jump on Independence Day, so I'
ARCHITECHT
Learning from DeepMind's data-privacy woes
By ARCHITECHT • Issue #106 • View online
I assume readers in the U.S. are getting ready to check out and get a jump on Independence Day, so I’m sending this early and keeping it relatively lightweight.
But you’ll definitely want to pay attention to the latest in the continuing saga of Google DeepMind and the UK government’s consumer-data watchdog, the Information Commissioner’s Office. The big news on Monday is that the ICO confirmed that DeepMind and the UK National Health Service did indeed violate data-privacy laws with an agreement they entered into in 2015. 
I wrote about this investigation back in March, and you can get more details in that post, as well as here:
The long story made short here is that DeepMind entered into agreements to access patient data from hospitals in an effort to train a model that can predict kidney problems and send message to doctors and nurses in real time. The government found that the hospitals gave more data to DeepMind than was necessary, and that DeepMind used the data in ways it hadn’t disclosed in the agreements.
However, you’ll also want to read DeepMind’s mea culpa—and proclamation of the project’s success—on this particular issue: The Information Commissioner, the Royal Free, and what we’ve learned. What DeepMind learned is that it vastly overlooked the complexities of and rationales for consumer data-privacy laws, but also that its Streams technology is working very well:
“We’re proud that, within a few weeks of Streams being deployed at the Royal Free, nurses said that it was saving them up to two hours each day, and we’ve already heard examples of patients with serious conditions being seen more quickly thanks to the instant alerts. Because Streams is designed to be ready for more advanced technology in the future, including AI-powered clinical alerts, we hope that it will help bring even more benefits to patients and clinicians in time.”
The ICO also issued a statement, via blog post, explaining essentially that privacy regulations are both necessary and are not inherently an impediment to innovation: Four lessons NHS Trusts can learn from the Royal Free case.
There’s still a lot of work to be done on the issue of data privacy and AI, in the U.S., U.K. and, I’ll assume around the world. On the on hand, AI is remarkably good at discerning patterns from medical data and medical images, and more data is often better. On the other hand, patients need to know that hospitals aren’t giving away their data to third parties without consent, and potentially opening it up acquisition by nefarious parties.
Matei Zaharia, one of the creators of Apache Spark and co-founders of Databricks, addressed this issue in a recent ARCHITECHT Show podcast interview, as well. He’s working on the DAWN project at Stanford, which counts among its aims the simplification of processing training data for AI models, especially in fields like medicine or other industries where data is hard to come by and even harder (or more expensive) to classify: 
“But what if you’re trying to parse medical records and automatically create diagnoses? You can’t put people’s medical records on the Internet for random people to label. … Even if you could, you need doctors to label them. So … it costs $200 per hour or more to do it.”
He added later in the interview:
“[I]f you look at software coming out of Google, for example, it’s really optimized for a Google-style company. They use that internally. That is a company where the applications are all on the web, so there’s a ton of data. …

”… If it gets great results, that’s great, but it doesn’t mean that it’s actually the right software for, say, a medical researcher at the Stanford Hospital who doesn’t have 20,000 PhDs and doesn’t have millions of labeled images. It’s actually a completely different problem for them. “
Projects like DAWN are trying to improve this situation by streamlining the classification process. AI researchers are working on reducing the amount of data their models need to learn accurately, and also to produce simulated data that can help train models where there’s a dearth of actual data. 
But this is a problem that’s not going away anytime soon, at least as far as companies are involved. There’s a disconnect between laws, financial incentives and the types of companies/organization collecting consumer data, as well as what they’re collecting. For every hospital, for example, there’s a health-care-app provider or other tech company that would love access to its data, and vice versa. And, as DeepMind points out, the benefits to patients and hospital staff can be meaningful—even if the someone had to bend some rules in order to get it done.
Smart people have been calling for some action on issues like this for ages—rules balancing privacy, innovation, anonymity and security—but the dawn of the artificial intelligence era seems like the right time to actually accomplish something.

Sponsor: Cloudera
Sponsor: Cloudera
Listen to the latest ARCHITECHT AI podcast
Sponsor: Linux Foundation
Sponsor: Linux Foundation
Artificial intelligence
Sponsor: DigitalOcean
Sponsor: DigitalOcean
Cloud and infrastructure
Sponsor: CircleCI
Sponsor: CircleCI
All things data
Sponsor: Bonsai
Sponsor: Bonsai
Did you enjoy this issue?
ARCHITECHT

ARCHITECHT delivers the most interesting news and information about the business impacts of cloud computing, artificial intelligence, and other trends reshaping enterprise IT. Curated by Derrick Harris.

Check out the Architecht site at https://architecht.io

Carefully curated by ARCHITECHT with Revue. If you were forwarded this newsletter and you like it, you can subscribe here. If you don't want these updates anymore, please unsubscribe here.