Preamble

The goal of this notebook is to bring together resources on the rstanarm R package, which provides an interface between the R programming environment and the Stan probabilistic programming language.

The creators of rstanarm say the goal of the package is to,

make Bayesian estimation routine for the most common regression models that applied researchers use. This will enable researchers to avoid the counter-intuitiveness of the frequentist approach to probability and statistics with only minimal changes to their existing R scripts.

If you are already fitting models using lm() or glm() form the stats package or using the mixed effects versions lmer() and glmer() from lme4, then it will be easy to translate those models to a Bayesian framework to get the Bayesian benefits. To do so, you:

A great place to get started is the CRAN rstanarm page, which has a ton of helpful vignettes. This document is my attempt to pull together pieces of information that I found helpful. In it, we will:

Set up

Clear the workspace and set some global options for the notebook.

Load packages and set plotting aesthetics.

Loading required package: Rcpp
rstanarm (Version 2.15.3, packaged: 2017-04-29 06:18:44 UTC)
- Do not expect the default priors to remain the same in future rstanarm versions.
Thus, R scripts should specify priors explicitly, even if they are just the defaults.
- For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores())
Loading required package: Matrix
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages --------------------------------------------------------------------------
expand(): tidyr, Matrix
filter(): dplyr, stats
lag():    dplyr, stats

Attaching package: ‘magrittr’

The following object is masked from ‘package:purrr’:

    set_names

The following object is masked from ‘package:tidyr’:

    extract

A lot of the code in this tutorial is inspired by a blogpost from Tristan Mahr where he models the effect of sleep deprivation on Reaction Time using the sleepstudy dataset. The post is great! And it’s a really nice example of how to use visualization to understand what’s going on in a data analysis model.

Here is some backgound information on the dataset:

The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject.

Load the data and look at the structure.

Observations: 180
Variables: 3
$ Reaction <dbl> 249.5600, 258.7047, 250.8006, 321.4398, 356.8519, 414.6901, 382.2038, 290.1486, 4...
$ Days     <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, ...
$ Subject  <fctr> 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 309, 309, 309, 309, 309, 309, ...

Let’s get a sense of the variability due to random effects by plotting the relationship between days of sleep deprivation and Reaction Time for each participant.

We can also summarize these data in table by computing the mean and standard deviation for each day. These summary statistics will also be used as parameters in our simulated dataset.

Days m stdev
0 256.65 32.13
1 264.50 33.43
2 265.36 29.47
3 282.99 38.86
4 288.65 42.54
5 308.52 51.77
6 312.18 63.17
7 318.75 50.10
8 336.63 60.20
9 350.85 66.99

Simulation

To help us understand the models in the RStanArm package, let’s write a few functions to simulate these data. Note the repeated measures structure: that we have multiple measures for each participant that come from different days. Participants could vary in their baseline RTs and they could also vary in how they respond to sleep deprivation, so we want to build this into our simulation.

Next, let’s write a function to generate model parameters for each participant. Note that the user can control:

Test our simulate_participant() function with no added noise and the same age effect as in the original dataset.

Now that we can simulate the data for a single participant, let’s simulate the entire experiment.

Plot the simulated data by participant with regression lines fit to each participants’ data.