First things first

If you use Amazon Mechanical Turk and haven't read this WIRED story from last week, do yourself a favor and read it now. The gist is that some researchers who use MTurk for academic studies have been noticing an uptick in bad results to survey questions. By bad results, we're talking about apparent gobbledygook, not just someone rushing through a task to collect their payment.

They fear bots are replacing human Turkers in some way, and as a result making the platform less reliable for the types of research they're conducting.

If you're asking what Mechanical Turk is, it's basically a platform for paying people small amounts of money for performing tasks via their desktops. Amazon launched it in 2005, and it's remarkably popular as a tool for labeling images and other data that train artificial intelligence models.

It's the latter point -- that many researchers use MTurk as a means of human-labeled training data -- that's a little worrisome with regard to AI. High-quality training data is a fundamental requirement of today's deep-learning-based approach to AI; without it, even the best ideas don't stand much of a chance of getting off the ground. It's not just mountains of money that make companies like Google, Facebook and Pinterest leaders in AI, but also the mountains of quality data they have.

As the stakes get higher for researchers and AI is rolled out more broadly in public-facing capacities, it's pretty easy to spot the potential problems of a bot-heavy MTurk platform. Assuming users are able to identify and disqualify blatantly nonsensical responses or labels, they're still wasting time and money doing so. If bots are good enough to slip through the cracks, their erroneous labels risk poisoning the accuracy of the resulting models.

But whether or not the MTurk-bot scare is for real, I also think there's a broader problem here, which is the reliance on third parties to label data at all. It might not be a huge deal for small experiments, but it seems increasingly irresponsible for work that really matters. While Turkers overall might be reliable, it's also conceivable that poorly paid strangers won't feel too incentivized to do a great job. Obviously, dealing with sensitive data on the platform would be a pretty bad idea from the get-go.

There also are startups that provide human resources to label data. Figure Eight (nee CrowdFlower) springs to mind, as does Mighty AI in the autonomous vehicle space. My concern applies to them as well, but it's possible they do a better job vetting the mystery workers powering these services, and it's also possible they pay better. I honestly don't have enough information to make an accurate assessment.

However, what both Figure Eight and Mighty AI also do, and what a growing number of startups are working on, is provide tooling for companies to automate and streamline data-labeling as much as possible themselves. This, along with synthetic data, seems like the right approach to solving data-labeling in the long run. And for companies that do enough of it, perhaps there's an opportunity to hire employees (or at least long-term contractors) whose primary job is labeling data and otherwise ensuring it's accurate -- kind of like law firms hire young lawyers for document review.

As long as high-quality training data is a critical asset to the success of AI, it's probably a good idea to treat it as such. If that means paying more money and spending more time on labeling it, then so be it. Surely we don't want the AI revolution delayed because of accuracy when there are so many juicier reasons to be concerned.

Read and share this issue online here.

AI and machine learning






Cloud and infrastructure






Data and analytics