STA 290 Seminar Series
Thursday, February 25th, 4:10pm, MSB 1147 (Colloquium Room)
Refreshments at 3:30pm in MSB 4110 (Statistics Lounge)
Speaker: Aaditya Ramdas (UC Berkeley)
Title: "The p-Filter: multilayer FDR control for grouped hypotheses"
(Joint work with Rina Foygel Barber, U Chicago Statistics)
Abstract: False discovery rate (FDR) control has recently proved to be critical in scientific applications involving testing multiple hypotheses on the same dataset, and is intricately related to the recent public controversy regarding reproducibility of scientific results. In many practical applications of multiple hypothesis testing, the hypotheses can be naturally partitioned into groups, and one may not only want to control the number of falsely discoveries (wrongly rejected individual hypotheses), but also the number of falsely discovered groups of hypotheses (where a group is said to be falsely discovered if at least one hypothesis within that group is rejected, when in reality none of the hypotheses within that group should have been rejected). In this paper, we introduce the \textit{p-filter}, a generalization of the standard FDR procedure by \citet{BH95}, and prove that our proposed method can simultaneously control the finer-level overall FDR (individual hypotheses treated separately) and the coarser-level group FDR (when such groups are user-specified).
We then generalize the p-filter procedure even further to handle multiple partitions of hypotheses, since that might be natural in many applications. For example, in neuroscience experiments, we may have a hypothesis for every (discretized) location in the brain, and at every (discretized) timepoint: does the stimulus correlate with activity in location x at time t after the stimulus was presented? In this setting, one might want to group hypotheses by location or by time (or both). Our procedure naturally generalizes to handle multiple possible partitions of the hypotheses; in the above example, this would amount to controlling overall FDR over all voxels and time points, and FDR at each individual voxel over all time points, and FDR over all voxels at a particular time point. The method is theoretically very robust, and can handle multiple partitions which are nonhierarchical (i.e. one partition may arrange p-values by voxel, and another partition arranges them by time point; neither one is nested inside the other). The assumptions that we need are standard in the literature: we do not need the hypotheses to be independent, but require a nonnegative dependence condition known as PRDS. We verify our findings with simulations that show how this technique can not only lead to the aforementioned multi-layer FDR control, but also lead to improved precision if hypotheses that are likely to be rejected together are explicitly grouped together, allowing the scientist to explicitly and flexibly employ field-specific prior knowledge.
Bio: Aaditya Ramdas is a postdoctoral researcher in EECS and Statistics and UC Berkeley, advised by Michael Jordan and Martin Wainwright. He finished his PhD in Statistics and Machine Learning at CMU, advised by Larry Wasserman and Aarti Singh. His recent interests include high-dimensional hypothesis testing, sequential analysis, statistics on groups, and concentration inequalities for MCMC.