Part 0: Introduction

Data science articulated, data science examples, history and context, technology landscape

Part 1: Data Manipulation, at Scale

Databases and the relational algebra

Readings

MapReduce, Hadoop, relationship to databases, algorithms, extensions, language; key-value stores and NoSQL; tradeoffs of SQL and NoSQL Readings

Data cleaning, entity resolution, data integration, information extraction*(NOT COVERED IN LECTURES)Readings* / Talks

Part 2: Analytics

Topics in statistical modeling and experiment design Readings

Introduction to Machine Learning, supervised learning, decision trees/forests, simple nearest neighborReadings

Unsupervised learning: k-means, multi-dimensional scaling

Readings

Part 3: Interpreting and Communicating Results

Visualization, visual data analytics Readings (well, watchings)

Backlash: Ethics, privacy, unreliable methods, irreproducible results

Part 4: Graph Analytics

Readings