STA 160 Practice in Statistical Data Science


Goals:
This course serves as a capstone course in which the students focus on the practice of data analysis, and both statistical and computational reasoning. They work on all steps in the data pipeline and workflow to get authentic experience in analyzing and working with data.

Summary of course contents:
Students will work in groups of 3 - 4 members on a data analysis project. They will:

a) frame the question and possible approaches,
b) acquire data (if necessary),
c) clean and explore the data,
d) use appropriate statistical and machine learning methods to effectively answer the question(s), and
e) prepare a technical report & presentation (for a non-statistical audience) detailing the conclusions and insights, potential shortcomings/issues, and possible alternative approaches and directions.

The instructor will provide/select/approve the projects. Sample problems may be adapted from journal papers, activities in previous versions of 260, and the consulting activities of both the department’s StatLab and the campus’ Data Science Initiative. Also, similar to 260 and ECS193A,B, instructors can solicit problems from researchers on campus. Two or more different groups may work independently on the same project. Students will be introduced to the projects at the start of the course. The first 4-5 weeks of the course will involve studying sample case studies in statistical data science. These will illustrate all of the steps a) through e) above and prepare the students for working on the project. The instructor may also use the lectures to introduce new statistical methods that occur in the case studies or multiple team projects. The course will also discuss technical writing. Students will be encouraged to use best practices such as version control and reproducible computations (e.g., using knitR or iPython notebooks).

Illustrative reading:

  • Statistics: a Guide to the Unknown, edited by Peck, Casella, Cobb, Hoerl, Nolan. 2005
  • Stat Labs: Mathematical Statistics Through Applications, Nolan & Speed, 2001
  • Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving, Nolan & Temple Lang, 2014

Potential Overlap:
This course has some similarity to course 260, but is at the undergraduate level.

History:
First offered Spring 2017.