First things first

OK, this time I'm actually keeping it brief. There are a lot of interesting items in this issue, but these three really stuck out:

Applying cloud-native technologies such as Docker containers and Kubernetes to data science is growing at the expense of traditional Big Data (Hadoop/Spark).

Google Cloud’s data services outrank those of Amazon Web Services (AWS) and Microsoft Azure. Although Google Cloud is the third largest cloud provider, its focus on data services is paying off with the Anaconda community.

  • Metacat: Making big data discoverable and meaningful at Netflix (Netflix): A few things stand out about this federated data discovery service that Netflix built, including how Netflix uses a combination of technologies new and old (from Pig to Presto). Also, the mention of similar systems at Twitter and LinkedIn, both of which are also optimized for those companies' specific data environments. I get the impression data federation is still a much-desired, but unsolved, goal, which begs the question of how successful any off-the-shelf tooling could really be at solving it.
Read and share this issue online here.

Sponsor: Neo4j

The ARCHITECHT Show podcast

AI and machine learning

Sponsor: MongoDB

Cloud and infrastructure

Sponsor: Replicated

Data and analytics