The goal of this notebook is to bring together resources on the *rstanarm* R package, which provides an interface between the R programming environment and the Stan probabilistic programming language.

The creators of *rstanarm* say the goal of the package is to,

make Bayesian estimation routine for the most common regression models that applied researchers use. This will enable researchers to avoid the counter-intuitiveness of the frequentist approach to probability and statistics with only minimal changes to their existing R scripts.

If you are already fitting models using lm() or glm() form the *stats* package or using the mixed effects versions lmer() and glmer() from *lme4*, then it will be easy to translate those models to a Bayesian framework to get the Bayesian benefits. To do so, you:

- add
`stan_`

to your models - specify a prior distribution (or use the default, weakly informative priors)
- learn how to “inspect” the posterior distribution and report the results

A great place to get started is the CRAN *rstanarm* page, which has a ton of helpful vignettes. This document is my attempt to pull together pieces of information that I found helpful. In it, we will:

- Fit different kinds of linear models to a simulated dataset: MLE, LMM, and Bayesian LMM
- Learn how to extract posterior samples from a fitted model object
- See a set of tools for visualizing distributions (both priors and posteriors) in a Bayesian analysis

Clear the workspace and set some global options for the notebook.

Load packages and set plotting aesthetics.

```
Loading required package: Rcpp
rstanarm (Version 2.15.3, packaged: 2017-04-29 06:18:44 UTC)
- Do not expect the default priors to remain the same in future rstanarm versions.
Thus, R scripts should specify priors explicitly, even if they are just the defaults.
- For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores())
Loading required package: Matrix
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages --------------------------------------------------------------------------
expand(): tidyr, Matrix
filter(): dplyr, stats
lag(): dplyr, stats
Attaching package: ‘magrittr’
The following object is masked from ‘package:purrr’:
set_names
The following object is masked from ‘package:tidyr’:
extract
```

A lot of the code in this tutorial is inspired by a blogpost from Tristan Mahr where he models the effect of sleep deprivation on Reaction Time using the `sleepstudy`

dataset. The post is great! And it’s a really nice example of how to use visualization to understand what’s going on in a data analysis model.

Here is some backgound information on the dataset:

The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject.

Load the data and look at the structure.

```
Observations: 180
Variables: 3
$ Reaction <dbl> 249.5600, 258.7047, 250.8006, 321.4398, 356.8519, 414.6901, 382.2038, 290.1486, 4...
$ Days <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, ...
$ Subject <fctr> 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 309, 309, 309, 309, 309, 309, ...
```

Let’s get a sense of the variability due to random effects by plotting the relationship between days of sleep deprivation and Reaction Time for each participant.

We can also summarize these data in table by computing the mean and standard deviation for each day. These summary statistics will also be used as parameters in our simulated dataset.

Days | m | stdev |
---|---|---|

0 | 256.65 | 32.13 |

1 | 264.50 | 33.43 |

2 | 265.36 | 29.47 |

3 | 282.99 | 38.86 |

4 | 288.65 | 42.54 |

5 | 308.52 | 51.77 |

6 | 312.18 | 63.17 |

7 | 318.75 | 50.10 |

8 | 336.63 | 60.20 |

9 | 350.85 | 66.99 |

To help us understand the models in the RStanArm package, let’s write a few functions to simulate these data. Note the repeated measures structure: that we have multiple measures for each participant that come from different days. Participants could vary in their baseline RTs and they could also vary in how they respond to sleep deprivation, so we want to build this into our simulation.

Next, let’s write a function to generate model parameters for each participant. Note that the user can control:

- the amount of participant-level variability using the
`ss_noise`

argument, which controls the width of the noise distribution. higher numbers result in more noise for each participant’s measurement. - the size of the effect of sleep deprivation using the
`location_shift`

argument. higher numbers result in a smaller effect of day

Test our simulate_participant() function with no added noise and the same age effect as in the original dataset.

Now that we can simulate the data for a single participant, let’s simulate the entire experiment.

Plot the simulated data by participant with regression lines fit to each participants’ data.