Flowing Data is a great website for data visualizations. I love the one showing
the sequence of Walmart store openings over time or the perfect choice of
bike for each San Francisco neighborhood.
However, as someone who not only enjoys looking at data but is also focused on actually doing something with it, I felt today's
article really hit home.
For a few years the Netflix prize has brought the top data scientists on the planet into an annual open competition to set a new bar on best movie predictions. It's the modern day equivalent of the
KDD Cup, which was where Causata's COO Paul Phillips cut his teeth. The article highlights that the algorithm from the latest winning team was deemed not practical enough to put into production. I'm actually not that surprised -- though to be fair I doubt the competition had any requirements for that.
Nevertheless it does underscore a real gap I see in how people look at big data and making predictions with machine learning, especially when it comes to marketing or customer interactions. It's just not appreciated enough that the whole point of going through the effort to build a statistical model is to extract business value from it and that doing so quickly is essential. When I hear SAS analysts talk about how long it typically takes to put a model into production I'm always amazed. The months of recoding SAS code into SQL, the many compromises along the way, and the heavy validation effort in a database or data warehouse raise questions about whether it's the right approach.
This is a topic where I'm 100% sure we're way ahead at Causata. If you've built a statistical model in a tool like SAS or R it can be imported into Causata in seconds and any individual can then be scored in the next instant. Causata's
real-time scoring also incorporates the latest data, including customer interactions in the last second. So if a new website visitor arrives via a high-value search term, a score will scream that they are a hot prospect. There are dozens of important use cases related to this capability. It's a huge deal. The philosophy for us is a bit like the agile software development process applied to data. You don't want a long period of time to pass before releasing a statistical model into production. Release early, release often.
Check out the full
Netflix blog article that the Flowing Data piece refers to. The folks at Netflix explain how their business is evolving and this has changed the nature of the predictions and personalization they need to perform. This is just yet another reason why you need to get the data into production rapidly.
My prediction... in the next 12 months this agility topic is going to be top of mind for a lot more people. We see this growing awareness among the top marketing and analytics customers we're working with. How top of mind is it for you?