Learning outcomes:
1. Achieve a basic understanding of concepts of statistical learning.
2. Learn about the language of statistical learning (supervised/unsupervised; training of a procedure; training set/test set).
3. Learn about some more advanced concepts (loss function, risk); Why are they important?
4. How to honestly assess the performance of algorithms? Using validation/test sets.
5. Learn about some more advanced techniques, and develop some heuristic understanding for when they provide some useful information and when not.
Course content:
1. Overview of supervised and unsupervised procedures.
2. Concept of loss function and empirical risk.
3. Regularization and principle of cross validation.
4. Classification using model-based and model-free techniques, and evaluating classifier performance.
5. Clustering methodologies.
6. Dimension reduction techniques.
7. Nonparametric smoothing techniques.
Illustrative Reading:
1. James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. Springer.
Potential Overlap:
This course has significant overlap with materials covered in STA 142A and ECS 171. It also has some overlap with the contents of STA 035C.
This course covers similar material but with different emphasis. Contrary to the other courses, STA 109 is targeted primarily at students from social, environmental and biological sciences. Students are not expected to have the same mathematical preparation as in STA 142A and ECS 171.
STA 109 will be the upper-division anchor course for a proposed interdisciplinary major in data science for those students less technically inclined. These students will take STA 15A-C as preparatory course sequence rather STA 35A-C, addressing overlap in contents between STA 109 and STA 35C.
History:
None