Hadoop is often taken to be synonmous with Big Data, the solution to an organization’s decision-making ills, the cure for cancer, the caramilk secret and more. But this is, of course, not the case. Hadoop merely provides a foundation for working with much of the data that was previously unreachable or unuseable.

Similarly, the term data science is much bandied about.

6a00e3932f172e8834017c38403881970b-pi

So I attended the inaugural talk at Data Science Vancouver by Don Turnbull about this new discipline data science. The talk wasn’t recorded but you can follow Don’s exposition thru his slides. My 5-second summary is that data science is essentially a renaissance of science that pairs new techniques with (dare I say it) old-school scientific method. As Don puts it, it is science afterall.

Don concluded with a great reading list for someone either getting into or wanting/needing to deepen their understanding of this field. For your convenience, I’ve reassemble the list here with Amazon links. (I didn’t know that Issac Asimov’s Foundation is not available in ebook format. The other books in the series are… just not the first. Weird)

Fiction

  1. Foundation, by Isaac Asimov 

Statistics

  1. The Control Revolution, by James R. Beniger 
  2. Against the Gods: The Remarkable Story of Risk, by Peter L. Bernstein
  3. Data Mining: Practical Machine Learning Tools and Techniques, by Ian H. Witten and Eibe Frank (and Mark Hall?)
  4. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century, by David Salsburg
  5. The Laws of the Web: Patterns in the Ecology of Information, by Bernardo A. Huberman
  6. The Rise of Statistical Thinking, 1820-1900, by Theodore M. Porter
  7. When Information Came of Age: Technologies of Knowledge in the Age of Reason and Revolution, 1700-1850, by Daniel R. Headrick
  8. Men of Mathematics, by E. T. Bell

Knowledge Discovery and Data Mining (KDD)

  1. Advances in Knowledge Discovery and Data Mining, edited by Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth and Ramasamy Uthurusamy
  2. Machine Learning, by Tom M. Mitchell
  3. Readings in Information Visualization: Using Vision to Think, by Stuart K. Card, Jock D. Mackinlay and Ben Schneiderman