STA 35A Statistical Data Science I


Goals:

1. Obtain a basic understanding of randomness, sampling variability, probability models and probability computations.
2. Understand differences among data types.
3. Get familiar with basics of the R programming language.
4. Learn how to use visualization tools to deepen understanding of statistical concepts.
5. Learn how to use simulations to understand sampling distributions.
6. Understand fundamental concepts of hypothesis testing and confidence intervals.

Summary of course content:
1. Overview of data types – continuous (univariate and multivariate), categorical.
2. Introduction to R programming [emphasis on rules of computation involving vectors, matrices and data frames].
3. Basic statistical summaries for numerical and categorical data [emphasis on use of functions in R for numerical and visual summaries].
4. Rules of probability computation; conditional probability.
5. Basic probability models: Binomial, Normal and Poisson [emphasis on simulation and visualization of distributions and probability computation using R].
6. Sampling distributions of sample mean and sample proportion [emphasis on simulation studies and visualization of results using existing packages].
7. Hypothesis testing and confidence intervals for population mean and population proportion [integrated with computation using R through use of existing packages].

Illustrative Reading:
1. Ramsey, F. and Schafer, D. (2012). The Statistical Sleuth: A Course in Methods of Data Analysis, 3rd Edition. Cengage Learning.
2. Bruce, P. and Bruce, A. (2017). Practical Statistics for Data Scientists: 50 Essential Concepts. O'Reilly Media.
3. Matloff, N. (2012). The Art of R Programming. No Starch Press.

Potential Overlap:
The course has some overlap with the content of STA 032 and STA 100. However, materials like regression, analysis of variance and exploratory data analysis are not covered in this course. Also, this course aims to integrate programming and data visualization (in R) closely with learning statistical methodologies. There is also some overlap in content with STA 013, though this course aims to cover these topics in a more mathematically and computationally integrated matter.

History:
None