STA 208: Statistical Methods in Machine Learning

Subject: STA 208
Title: Statistical Methods in Machine Learning
Units: 4.0
School: College of Letters and Science LS
Department: Statistics STA
Effective Term: 2013 Fall

Learning Activities

  • Lecture - 3.0 hours
  • Discussion/Laboratory - 1.0 hours

Description

Focus on linear and nonlinear statistical models. Emphasis on concepts, methods, and data analysis; formal mathematics kept to minimum. Topics include resampling methods, regularization techniques in regression and modern classification, cluster analysis and dimension reduction techniques. Use professional level software.

Prerequisites

STA 206; STA 207; STA 135; Or their equivalents.

Expanded Course Description

Summary of Course Content: 
1. Resampling methods: jackknife, bootstrap, cross validation (4 lectures) 
2. Elements of nonparametric function estimation: density, regression (3 lectures) 
3. Regularized regression: ridge, lasso and generalizations, partial least squares (5 lectures) 
4. Classification and discrimination: Fisher’s linear discriminant analysis, multiclass logistic regression, regularized LDA, support vector machine (5 lectures) 
5. Principle/methods of aggregation and boosting (4 lectures) 
6. Cluster analysis: k-means, hierarchical methods (3 lectures) 
7. Nonlinear dimension reduction methods: multidimensional scaling; topics chosen from: kernel principal component analysis, Laplacian eigenmaps, diffusion maps, spectral clustering, local linear embedding. (4 lectures) 


Illustrative Reading: 
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, by T. Hastie, R. Tibshirani and J. Friedman, Springer. 

Potential Course Overlap: 
There is some overlap with materials taught in STA 232A, STA135, STA232C and STA243. However, the overlap with STA135 and STA243 is rather minor and more machine learning tools and principles are included in STA208 than in either of these courses. However, the knowledge of the material in STA135 is seen as essential for the second half of the course. Even though there is some overlap with STA232A and STA232C, these are core courses for Statistics Ph.D. students and are taught at a much higher level.