STA 35C Statistical Data Science III

Goals:

1. Understand basic concepts behind statistical learning methodologies and learn how and when to apply them.
2. Developing understanding on limitations of popular learning methods.
3. Deepen understanding of probability models (in particular "conditional probability").
4. Develop basic understanding of classification and clustering methodologies.

Summary of course content:
1. Conceptual summary of supervised and unsupervised learning.
2. Overview of conditional probability and Bayesian paradigm.
3. Notions of model selection and regularization.
4. Elements of simultaneous inference - false discovery rate control procedures.
5. Concepts of bootstrap and cross validation - implement in the context of inference and model selection for linear regression (including ridge regression).
6. Concepts of classification - LDA and logistic regression [use of existing packages].
7. Concepts of clustering - hierarchical and k-means clustering [use of existing packages].
8. Basic dimension reduction techniques - PCA [emphasis on visualization and interpretation].
9. Basics of nonparametric smoothing techniques [emphasis on the notion of bias-variance trade-off; use of existing packages].

Illustrative Reading:
1. James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. Springer.

Potential Overlap:
There is some overlap with STA 142A, STA 141C, and ECS 171 in terms of coverage of the core concepts and methodologies of statistical learning. But the emphasis here is on introducing these concepts through extensive data analysis and use of existing computational tools, without much reliance on detailed mathematical analysis and algorithmic implementation.

History:
None