by Cindy Atoji Keene
The data scientist has been called the sexiest job of the 21st century. The emergence of this role – an elusive blend of software engineer, statistician, and business analyst – is akin to a cowboy trying to harness the wild frontier of “Big Data.” As organizations wrestle with the volumes of information generated by mobile devices, social computing, the cloud, and more, data scientists help turn data into insights that companies can use.
Big Data will add 1.9 million IT jobs in the U.S. alone by 2015, with a majority of those hires happening in Boston and Silicon Valley. Some of those data scientists are brilliant, quirky and nocturnal, said Andrew Schwartz, Lattice Engine’s co-founder and data scientist, who lives on a sailboat on Boston Harbor; rides a bright green recumbent bike to work, and does his best work between midnight and 4 a.m. “To really understand a problem, it helps to be completely isolated with my computer. I have to sit and stare at it because it takes me a while to get everything in my brain. The longer I can work and understand the problem, the more productive I am,” said Schwartz, who created Lattice Engines to help businesses make more informed decisions about sales and marketing opportunities. Companies, for example, can make predictions about which customers and prospects are most likely to buy products, based on data about purchase history and customer service records, as well as outside sources such as LexisNexis, tax records, and other databases.
A: Is there an art to data science?
A: Data science is a mixture of business analysis, analytics modeling, and data management, and you have to know how to look at large amounts of data to help businesses gain a competitive edge. It’s like a scientist painting a picture: you have to do some analysis and sketch out what the main figures are, while also teasing out the fine-grain details, like choosing your colors. A data scientist needs to be methodical but also make guesses.
A: Data scientists can look at data and spot trends. What trends have you discovered?
Q: In a previous company, for example, I worked to help a casino optimize their slot machines. We did some controlled experiments to see how customers responded to payoffs, or winning the jackpot. We discovered that the size of the ‘take’ is not as important as the frequency and distribution, meaning that people are very responsive to hearing someone else win, even if it’s just a small amount of money. They are more likely to stay around and play if they hear other people hit the jackpot.
Q: How did you become a data scientist?
A: I have a Ph.D. in Math from Harvard. Like everyone else, I went off and taught. I was standing in front of a calculus class of 150 students when I realized that none of the students really wanted to be there, and no one would ever use what I was teaching. After classes ended, I went sailing for three months with a programming book, and by the end of my trip, I knew how to program.
Q: What are some buzzwords among data scientists?
A: Hadoop is a system to put break up problems and put together results; Hive is a tool to make queries and build up a particular analysis. You also might be using SAS or R as a statistic library. The other big buzzwords are machine learning, regression analysis, and a nice one, bagging and boosting.
Q: What’s the future of the data scientist?
A: We are right now in the early stages of data science, so for the next five to 10 years, we’re going to be seeing a lot of projects opening up that are addressing the types of problem that you need Big Data to handle. In the next 15 years, as the basic footprint of Big Data spreads into a range of domains, then it will be time to find commonality and pull together a single Big Data optimization platform.
Q: Give us an analogy of how big Big Data is.
A: Big can be very big. It’s equivalent to the total of every email sent or received by Gmail for the past four years; or every financial transaction by every bank; or a reading of all the information on every person on every flight in the US for the past three years.
Q: Are you using data analysis to buy a new boat?
A: Yes, I am looking for a slightly bigger boat, so I’m following the classified ads and doing a data analysis for pricing, including how long a boat has been on the market, and the relationship between price, space, and retail value. I’m spending all my time trying to correct my models of analysis. My boat broker hates me.