In data analysis, details matter
What does a limousine service have to do with brain science? Neuroscientist Bradley Voytek recently came to Microsoft's Mountain View campus to explain how data analysis can be applied to road networks and brain connections. He is a post-doc at UCSF as well as a data evangelist for chauffeur service startup Uber.
His "(Im)practical Data Analytics" talk started with a riddle: "How many people were born in Britain on Sept. 10, 1752?" He suggested how one might calculate the solution by looking at population statistics, trends and estimates. The answer is 0. The dates from 3rd to the 13th of September 1752 were skipped when the Gregorian calendar replaced the Julian calendar, so there was no Sept. 10 in 1752. Knowing the context for data analysis is critical to avoid making mistakes.
The brain fills in for missing information. To demonstrate, he played sound clips of a person talking. The first one was badly muffled by noise — like a very crackly phone connection. I could only pick out a couple of words at the end. On repeating the clip without the crackly noise, the words were clear. He then replayed the original clip. I could understand every word because the brain helps separate the voice from the noise.
Subscribers to Uber request a car with driver on their smartphones. Uber maps where cars are available and where people want rides. Brad explained how he spent the summer of 2011 helping Uber analyze its data. They used the data to reduce the time people wait for a ride. Eventually the waiting time became too short, sometimes only 90 seconds, and customers grew dissatisfied as they needed time to get ready. Brad discovered at Uber that CEOs like timely, practical results. He admonished anyone in an academic establishment to take a sabbatical and join a fast-paced startup.
For drivers to find passengers, latitude and longitude must map to the correct street address. Experimenting with 500 locations in Washington, D.C., he plotted charts that showed that Mapquest's map data was much less accurate than that from Apple, Google or Microsoft. Exploring further, he discovered that Mapquest had omitted the direction of a street. Omitting "NW" in "16th Street NW" was causing errors. Details in data collection matter.
When Brad wanted to study academic papers in neuroscience he found far too many to read. So he created, with his wife Jessica, software to search for them online and see which words for drugs, diseases and brain regions occurred in the same paper. Plotting the connections between the words on a network gave new insights. You can check out the associations between brain-related terms on brainSCANr.com. It is useful for bridging the medical literature world with the neuroscience world. In the medical literature over 3000 papers associated serotonin with migraines. In the neuroscience literature, almost 5000 papers related serotonin and the striatum part of the brain. Only 16 papers mentioned both migraines and the striatum. So it suggests that there's a potential for more research in that area.
The event was organized by The Hive, a new incubator for startups working with massive amounts of data, based in Palo Alto, Santa Monica and San Francisco. T.M. Ravi, a co-founder, noted at the beginning of the conference that The Hive aims to "take the complexity of developing applications away from the people who are building applications." They plan to invest in just a few startups, but give them plenty of hands-on help. The Hive presents Big Data Think Tank Meetups in both San Francisco and Silicon Valley.