ComputerClassMCMC_pt1_CodeActivities.Rmd

---
title: "ST308 Computer Workshop MCMC part 1 - with code for the activities"
output:
  html_document:
    df_print: paged
---


## Gibbs Sampler

We will illustrate a Gibbs sampler via the following toy example. Let $y=(y_1,\dots,y_n)$ be a random sample from the N($\mu,\sigma^2$). Leet's use the improper prior $\pi(\mu,\sigma^2)\propto (\sigma^2)^{-1}$ for simplicity.

As shown in class the full conditional distributions for $\mu$ and $\sigma^2$ are the N$(\bar{y},\frac{\sigma^2}{n}$ and the IGamma$\left(n/2, \frac{1}{2}\sum_{i=1}^n(y_i-\mu)^2\right)$ respectively.

### Demonstration

We will generate $1000$ independent N($\mu,\sigma^2$) random variables setting $\mu$ and $\sigma^2$ to some values. 

```{r}
set.seed(10)
N=1000;
mu = 5;
sigma = 4;
y = rnorm(N,mu,sigma);
mean(y)
sd(y)
```

Then we will run the Gibbs Sampler for the model above to sample from the posterior of $\mu$ and $\sigma$
```{r}
Niter = 1000 # number of MCMC iterations
out_mu = rep(NA,Niter); #vector to store the mu draws
out_sigma2 = rep(NA,Niter); #vector to store the sigma2 draws
mu = 20; # initial value for mu
sigma2 = 2; # initial value for sigma2

xbar = mean(y) # sufficient stat for mu
alpha = N/2; # alpha parameter of the IGamma full conditional for sigma2

#Main loop of Gibbs Sampler
for (iter in 1:Niter){
  # Store mu and sigma2 values
  out_mu[iter] = mu 
  out_sigma2[iter] = sigma2
  
  #update mu given sigma2 (and y)
  mu = rnorm(1,xbar,sqrt(sigma2/N))
  
  #update sigma2 given mu (and y)
  beta = 0.5*sum((y-mu)^2);
  sigma2  = 1/rgamma(1,alpha,beta);
}
```

Finally we will summarise the posterior output to obtains Bayes estimators and 95\% credible intervals.

```{r}
plot(out_mu,type='l')
plot(out_sigma2,type = 'l')
print('mu')
print(c(mean(out_mu),median(out_mu),quantile(out_mu,probs=c(0.025,0.975))))
print('sigma2')
print(c(mean(out_sigma2),median(out_sigma2),quantile(out_sigma2,probs=c(0.025,0.975))))
```

### Activity

Repeat the above exercise for the following model
$$
y_i\sim \text{Poisson}(\lambda), \;\;i=1,...,N,
$$
with priors 
$$
\lambda \sim \text{Gamma}(2,\beta),
$$
$$
\beta \sim \text{Exponential}(1)
$$
It can be shown (good exercise) that the full conditionals for $\lambda,\beta$ are the Gamma$(2 + \sum_i y_i,n+\beta)$.
and the Gamma$(3,1+\lambda)$.

Data can be simulated in the following way:

```{r}
set.seed(5)
N=1000;
beta = 1;
lambda = rgamma(1,2,beta);
y = rpois(N,lambda);
mean(y)
lambda
```

#### Code for activity

Run the Gibbs Sampler for the model above to sample from the posterior of $\lambda$ and $\beta$
```{r}
Niter = 1000 # number of MCMC iterations
out_lambda = rep(NA,Niter); #vector to store the lambda draws
out_beta = rep(NA,Niter); #vector to store the beta draws
lambda = 10; # initial value for lambda
beta = 5; # initial value for beta

alpha.l = 2+sum(y) # alpha parameter for the IGamma full conditional of lambda
#Main loop of Gibbs Sampler
for (iter in 1:Niter){
  # Store lambda and beta values
  out_lambda[iter] = lambda 
  out_beta[iter] = beta
  
  #update lambda given beta (and y)
  lambda = rgamma(1,alpha.l,N+beta)
  
  #update beta given lambda (and y)
  beta  = rgamma(1,3,1+lambda);
}
```

Finally we will summarise the posterior output to obtains Bayes estimators and 95\% credible intervals.

```{r}
plot(out_lambda,type='l')
plot(out_beta,type = 'l')
print('lambda')
print(c(mean(out_lambda),median(out_lambda),quantile(out_lambda,probs=c(0.025,0.975))))
print('beta')
print(c(mean(out_beta),median(out_beta),quantile(out_beta,probs=c(0.025,0.975))))
```

## RStan

### Demonstration

```{r}
library("rstan")
rstan_options(auto_write = TRUE)
```


First simulate data and put them in a list.

```{r}
set.seed(10)
N=1000;
mu = 5;
sigma = 4;
obs = rnorm(N,mu,sigma);
mean(obs)
sd(obs)

toy_dat <- list(N=N, y=obs)
toy_dat
```

For the model see the file `ToyExample.stan`. Next we run MCMC

```{r}
## for MCMC
?stan
fit <- stan(file = 'ToyExample.stan', data = toy_dat, init=0)
## For variational Bayes
#ToyModel <- stan_model(file='ToyExample.stan')
#vb(ToyModel,data=toy_dat)
```


Finally, we inspect the output of the MCMC draws.

```{r}
print(fit)
plot(fit)
pairs(fit, pars = c("mu", "sigma"))
traceplot(fit)
```

#### Activity

Work with Boston dataset from the MASS library and fit a Bayesian linear regression model for the response variable 'medv' (median value home price) based on the independent variables.

The data can be loaded with the code below. A linear regression model (non-Bayesian) is also fit below for reference.

```{r}
library(MASS)
?Boston
summary(Boston)
lreg <- lm(medv~.,data=Boston)
summary(lreg)
```

The model and priors are given below

$$
y\sim N(X\beta,\sigma^2)
$$
$$
\sigma \sim N(0,100^2),\;\;\;\sigma>0
$$

$$
\beta_i \sim N(0,100^2), \;\;\;i=1,\dots,p.
$$

The data can be prepared in the following way and put into the list 'boston_dat'

```{r}
y = Boston$medv
n = length(y) # number of observations
X = cbind(rep(1,n),subset(Boston, select = -medv)) # add one column of 1's for the constant
X
p = dim(X)[2] # number of variables plus the constant

boston_dat=list(n=n, p=p, X=X, y=y)
head(boston_dat)
```

##### Hints for the stan file

 - The X matrix needs to declared in the data section using *matrix[n,p] X;* 
 - The value p should also be declared there in a similar manner with n.
 - For matrix/vector multiplication in Stan use A * B. If you want to multiply componentwise use A .* B.
 - the following lines of code do the same thing
```{}  
  for (i in 1:n)
    x[i] ~ normal(0,1)
```  

```{}
x ~ normal(0,1)
```

Write your stan file ans put your code for running and reporting MCMC (using stan) below:

```{r}
NormalLR <-stan(file = 'LinearRegression.stan', data = boston_dat,chains=1,init=0,seed=1)
```

```{r}
print(NormalLR,digits_summary = 3)
traceplot(NormalLR,pars=c('beta','sigma'))
```