ARCHITECHT Daily: A startup promises to solve massive data with a bitmap index

An Austin, Texas, startup called Pilosa launched today with an open source product it says can vastly
ARCHITECHT Daily: A startup promises to solve massive data with a bitmap index
By ARCHITECHT • Issue #67
An Austin, Texas, startup called Pilosa launched today with an open source product it says can vastly speed up access to massive datasets, and also make it much easier to federate them in a single place. The key piece of its technology is a distributed bitmap index (essentially, storing data as single bits—0s or 1s) that’s designed to fit in-memory. Pilosa says its index technology can sit on top of pretty much any data store, and that its closest natural competitor is Elasticsearch.
Pilosa spun out of a sports-fan-management company called Umbel, which used (and uses) the technology for customer segmentation. Pilosa also claims some early users in the bioinformatics and network security spaces. 
I spoke with CEO H.O. Maycotte and VP of Product Troy Lanier, who say the company’s initial focus is on building an open source community (here’s the Pilosa GitHub repo) and figuring out what the next set of ideal use cases might be. Although Pilosa does have an enterprise edition, as well. 
Maycotte used a pretty illustrative analogy to describe how people might think about Pilosa and where it would fit into their current big data environments:
“The card catalog’s always sat in the front of the library, but the truth is that the library is now just a very small piece of the information that you have access to as a students or as a citizen of a city. What we’re proposing now is, let’s make the card catalog its own building, and let’s not just look at the library that’s next door. Let’s look at all the libraries at once.
"That’s really what we want to do to your data. If it’s huge and it’s fragmented, we want to help you access it faster.”
The problems Pilosa is trying to solve for are obviously a huge deal today, with exploding volumes of data coming off everything from genomes to connected devices, and with data silos still so prevalent inside large companies. However, even assuming the technology works as advertised beyond its home at Umbel, Pilosa faces the challenge of convincing folks to give it a try. And then of convincing them to re-architect their data environments to incorporate yet another distinct technology on top of/in lieu of Hadoop (et al), Spark, Elasticsearch, Neo4j or whatever else they’re using. 
If the company is lucky, some common use cases and architectures will emerge from its community-building efforts, at it can leverage more well-known partner technologies as a foot in the door for educating the market.
In other news (with only a hint of self-promotion), accelerator 500 Startups announced the 500 Startups Data Track today, which will focus on companies building big data, machine learning and AI technologies. Chris Neumann (formerly of DataHero and Aster Data, and now venture partner at 500 Startups) is heading it up, and I’m proud to be among the group of speakers/mentors/coaches for the first batch of companies. I’ll be providing guidance mainly around marketing and PR, two things about which I have some experience on both sides of the equation.
The program kicks off on June 17, and interested startups can apply here.

Sponsor: Cloudera
Artificial intelligence
This actually isn’t an AI use case, which is kind of the point. It can be overkill at times.
Yesterday, I questioned the market for SigOpt, a company focused on tuning neural networks. This tutorial showing to use it with AWS and MXNet shows its promise, if not its opportunity.
This seems like a fair assessment of how financial advisers might use AI, although consumers might like the focus to be less on what they want and more on what’s most likely to make them money.
Listen the the ARCHITECHT Show podcast. New episodes every Thursday!
Cloud and infrastructure
The Docker and Kubernetes train keeps moving even once DockerCon ends. Here are three good posts addressing opportunity in the container space:
Just look at the first chart, showing internal traffic compared with public internet traffic, and you understand immediately why companies like Facebook and Google are investing so heavily in networking right now.
I’m not sure this is a data center infrastructure play, but it’s worth noting anytime Cisco buys a company for that much cash.
They’re all bad! (But Google might be a little better.) Despite all the focus on UI the past several years, complicated tech and legacy product remain a challenge.
They’re just getting smaller and more optimized, thanks in part to the cloud freeing up capacity and costs.
They’ve created a technique call “valleytronics” that could help manage electron movement and reduce heat. Of course, many workloads might have moved on from x86 by the time this is commercially viable.
And has a valuation of $1.3 billion. I included a link to this rumor a few weeks ago, and here’s the confirmation. Must work pretty well to justify that level of investment in a crowded space.
All things data
Apparently, SQL on Hadoop is not yet a solved problem. You have to wonder who is the user base, especially as more companies buy Hadoop from vendors with their own preferred tools for this.
A college professor prohibits his students from building projects based on exploration, because it’s too “unbounded” and “no one is paid to explore.” His argument makes some sense, and is definitely worth reading.  •  Share
I think by this point we could actually start with something about how data is or is not used. We probably understand the tradeoff well enough to do that in an effective manner.
Did you enjoy this issue?
The most interesting news, analysis, blog posts and research in cloud computing, artificial intelligence and software engineering. Delivered daily to your inbox. Curated by Derrick Harris. Check out the Architecht site at
Carefully curated by ARCHITECHT with Revue. If you were forwarded this newsletter and you like it, you can subscribe here. If you don't want these updates anymore, please unsubscribe here.