Event Date
The UC Berkeley and UC Davis Departments of Statistics invite you to:
2021 BERKELEY / DAVIS JOINT STATISTICS COLLOQUIUM
WEDNESDAY APRIL 21, 2021
We hope you can join us for this annual event, which this year is being held remotely (links below), and features two student talks along with the main presentation by Prof. James Sharpnack (UC Davis). There are also mixer events you can participate in – links below!
SCHEDULE:
2:30 - 3:00 Student mixer organized by Berkeley SGSA https://spatial.chat/s/statlounge.
3:00 - 3:30 Student talk: Qin Ding (UC Davis). Zoom link: https://berkeley.zoom.us/j/92161870927
3:30 - 4:00 Student talk: Jake Soloff (UC Berkeley). Zoom link: https://berkeley.zoom.us/j/92161870927
4:00 - 5:00 James Sharpnack (Assistant Professor, UC Davis). Zoom link: https://berkeley.zoom.us/j/92161870927
5:00 - 6:00 SpatialChat mixer: https://spatial.chat/s/statlounge.
ABSTRACTS:
Main Presentation (4:00pm)
Speaker: James Sharpnack
Title: Public health data and trend filtering
Abstract: We start this talk by outlining the efforts that our group are making to help combat the spread of Covid-19 in our community through the Healthy Davis Together (HDT) project (http://healthydavistogether.org) and in collaboration with the Delphi Lab (http://delphi.cmu.edu). Through a customer discovery process, we identified several data processing and analysis tasks to assist policy makers and scientists including the HDT executive committee, CDPH, and data journalists. Many of the most important tasks identified in this way are not simple prediction tasks, and instead can be cast as spatial clustering, segmentation, changepoint localization, and outbreak detection, to name a few. We will highlight one specific problem, that of spatio-temporal segmentation of CA county test positivity proportion using graph trend filtering (GTF). Trend filtering and graph segmentation provide locally adaptive function estimates which can solve a wide range of problems including small area estimation. There are also public policy applications in which the segmentation properties of GTF are desired as they lead to more demonstrably fair and interpretable predictions. We will take a brief tour of GTF, including some of the theoretical advances made in the past several years. A special emphasis will be made on the gaps in our understanding of GTF and what these results tell us about the denoising “power” of different graph structures. Returning to our application at hand, we apply GTF to Covid-19 test positive proportions of CA counties using a mobility network from the Safegraph mobility data. This work is in collaboration with the Delphi Lab, CDPH, and the Healthy Davis Together project.
STUDENT PRESENTATIONS:
Presentation # 1: 3:00pm
Speaker: Qin Ding
Title: An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling
Abstract: We consider the contextual bandit problem, where a player sequentially makes decisions based on past observations to maximize the cumulative reward. Although many algorithms have been proposed for contextual bandit, most of them rely on finding the maximum likelihood estimator at each iteration, which requires $O(t)$ time at the $t$-th iteration and are memory inefficient. A natural way to resolve this problem is to apply online stochastic gradient descent (SGD) so that the per-step time and memory complexity can be reduced to constant with respect to $t$, but a contextual bandit policy based on online SGD updates that balances exploration and exploitation has remained elusive. In this work, we show that online SGD can be applied to the generalized linear bandit problem. The proposed SGD-TS algorithm, which uses a single-step SGD update to exploit past information and uses Thompson Sampling for exploration, achieves $\tilde{O}(\sqrt{T})$ regret with the total time complexity that scales linearly in $T$ and $d$, where $T$ is the total number of rounds and $d$ is the number of features. Experimental results show that SGD-TS consistently outperforms existing algorithms on both synthetic and real datasets.
Presentation # 2: 3:30pm
Speaker: Jake Soloff
Title: Estimating an Unknown Prior from Heterogeneous Data via Nonparametric Maximum Likelihood
Abstract: Statistical inference of stellar populations is complicated by significant observational limitations—in particular, by multivariate, heteroscedastic measurement errors. Empirical Bayes is attractive in such settings, but assumptions about the form of the prior distribution can be hard to justify. We extend the method of nonparametric maximum likelihood (NPMLE) to allow for multivariate and heteroscedastic errors. The NPMLE estimates an arbitrary prior by solving an infinite-dimensional, convex optimization problem; we show that it can be tractably approximated by a finite-dimensional version. We show that the empirical Bayes posterior means have low regret, meaning they closely target the posterior means one would compute with the true prior in hand. Furthermore, the NPMLE can be used for a variety of other purposes such as density estimation and deconvolution. We apply the method to astronomy data to construct a fully data driven color-magnitude diagram of 1.4 million stars. This is joint work with Aditya Guntuboyina and Bodhisattva Sen.