Statistics Seminar
Building faster and more expressive BART models by Sameer Deshpande
Event Details
Abstract: Bayesian Additive Regression Trees (BART) is an easy-to-use and highly effective nonparametric regression model that approximates unknown functions with a sum of binary regression trees. Most implementations of BART are based on trees that (i) recursively partition continuous inputs one variable at a time; (ii) one-hot encode categorical predictors; and (iii) represent piecewise constant functions. These implementations are fundamentally limited in their ability to learn complex decision boundaries that are not aligned with coordinate axes; to “borrow strength” across multiple groups; to leverage structural relationships between multiple categorical predictors (e.g., adjacency and nesting); and to estimate smooth functions.
I will describe several extensions of BART based on new priors for decision rules and leaf outputs that overcome these limitations. A recurring theme across these extensions is the use of simple Metropolis-Hastings proposals that permit linear-time regression tree updates, thereby facilitating much more flexible BART models without incurring much additional computational burden. I will situate these developments within a larger program to understand how the effects of playing American-style tackle football in adolescence on later-life health varies over time and across the population.