Causata

Causata Blog

Big Data at Strataconf

Thursday, 3 February 2011

I and several of the Causata team are at O'Reilly Strataconf this week, a conference all about big data and the exciting possibilities it offers.

In a panel discussion in the opening session yesterday, Mike Olson of Cloudera made a very important point: big data is all very well, but things really get interesting when you combine diverse datasets.

This is an emerging theme in the conference, I'm seeing it everywhere.

In a tutorial on tools and techniques for the data analyst, Drew Conway and Hilary Mason took bit.ly data of web users who had clicked on links to Strataconf, and combined this with location data to plot a map. Interestingly for me, the Bay Area was only the second largest circle on the map - the largest was London! (Causata is in both).

Pete Skomoroch of LinkedIn combined the Strataconf attendee list with LinkedIn data to produce a visualization of the skills, companies, and job titles of the conference attendees.

Anthony Goldbloom of Kaggle announced the $3 million Heritage Health Prize, for using data mining to predict hospital admissions. There followed an interesting discussion on the whether it was fair to include external data sets, as is increasingly done to win data mining competitions. The consensus was that it's fine if it brings better predictions, provided it doesn't deter entrants to the competition who don't have access to this data. For me this just highlights the need for tools and datasets to do this well. I saw an interesting demo of using Google Refine to reconcile data against publicly available datasets, and there has been much discussion about the emergence of data markets.

All this is music to our ears at Causata: we derive powerful actionable insights precisely by combining customer data from multiple channels to give a comprehensive view of every individual customer.

Jason

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

Links to this post:

Create a Link

<< Home