ggmcmc packageThere is a pdf version of this document.
Follow the development of ggmcmc at github.
This documents presents the usage of ggmcmc, an R package with graphical tools for analyzing Markov Chain Monte Carlo simulations from Bayesian inference.
First, a fake dataset is produced. Then, JAGS is used to get samples from the posterior distribution and finally ggmcmc functions are used to generate the graphics needed for visual diagnostics of convergence of MCMC chains.
The current version (0.2) has minimal capabilities and will be extended in the future.
ggmcmc?
ggplot (based on the grammar of graphics) empowers R users by allowing them to flexibly crate graphics. Based on this idea, ggmcmc is aimed at bringing the ideas and implementation of ggmcmc to MCMC diagnostics, allowing mainly Bayesian inference to have better and more flexible visual diagnostic tools.
The code produces a minimal fake datased for a simple linear regression with an intercept (beta[1]) and a slope (beta[2]sigma) is also simulated.
require(rjags, quietly = TRUE) require(ggmcmc) set.seed(14718)
beta.0 <- 3 beta.1 <- 5 sigma <- 2
N <- 100 x <- rnorm(N, 0, 3) y <- beta.0 + beta.1 * x + rnorm(N, mean = 0, sd = sigma) X <- matrix(x, ncol = 1)
JAGS by writing it in a file using R and then invoking it from the same file when sampling:
write(" model { for (i in 1:N) { y[i] ~ dnorm(mu[i], tau) mu[i] <- beta[1] + beta[2] * X[i,1] } tau <- pow(sigma, -2) sigma ~ dunif(0, 100) beta[1] ~ dnorm(0, 0.001) beta[2] ~ dnorm(0, 0.001) }", "m-simple_linear_regression.bug")
JAGS, such as data, number of simulations, nodes to monitor, and initial values.
n.simu <- 2000 n.burnin <- n.simu/2 inits <- c(1, 5, -5) # tau (1) and betas (2) D <- list(y = y, X = X, N = N) par <- c("sigma", "beta")
JAGS.
m.jags <- jags.model("m-simple_linear_regression.bug", data = D, n.adapt = n.burnin, quiet = TRUE, n.chains = 2) S <- coda.samples(m.jags, par, n.iter = n.simu - n.burnin, quiet = TRUE)
JAGS output (coda) to ggmcmc format
Now the ``S'' object (which is a coda object) must be converted into a data frame with certain attributes (number of chains, parameters, iterations, and whether parallel computing can be used) that the graphical ggmcmc functions will understand. The ggs() function does that.
D <- ggs(S, parallel = FALSE)
head(D)
## Iteration Parameter value Chain ## 1 1 beta[1] 2.852 1 ## 2 2 beta[1] 2.883 1 ## 3 3 beta[1] 3.139 1 ## 4 4 beta[1] 2.888 1 ## 5 5 beta[1] 2.678 1 ## 6 6 beta[1] 2.866 1
str(D)
## 'data.frame': 6000 obs. of 4 variables: ## $ Iteration: int 1 2 3 4 5 6 7 8 9 10 ... ## $ Parameter: Factor w/ 3 levels "beta[1]","beta[2]",..: 1 1 1 1 1 1 1 1 1 1 ... ## $ value : num 2.85 2.88 3.14 2.89 2.68 ... ## $ Chain : int 1 1 1 1 1 1 1 1 1 1 ... ## - attr(*, "nChains")= int 2 ## - attr(*, "nParameters")= int 3 ## - attr(*, "nIterations")= int 1000 ## - attr(*, "parallel")= logi FALSE
By default ggs() tries to use parallel computing (which is a reasonable expectation for any Bayesian practitioner to have a multicore machine in 2012) .
library(doMC, quietly = TRUE) registerDoMC() D <- ggs(S)
What slows down ggmcmc is the current plotting, not the computation of several intermediate quantities needed for the plotting, so in this case parallel computing does not make substantial differences (only a 5\% increase in speed). But with larger parameters to analyze more chains and/or longer iterations it can make a difference.
ggmcmc()
ggmcmc() is a wrapper to several plotting functions that allows to create very easily a report of the diagnostics in a single pdf file. This output can be used to inspect the results more comfortable than using the plots that appear in the screen.
ggmcmc(D)
ggmcmc() plots 5 parameters in each page, and the dump file is called ``ggmcmc-output.pdf'', but we can change them.
ggmcmc(D, file = "model_simple-diag.pdf", param.page = 1)
We can also use the functions interactively to get the plots in the screen. The parameter to be plotted is the data frame that results from the application of the ggs() function to the original coda object that JAGS produces.
Notice that some of the functions differentiate the chains, but a few of them mix the chains to produce a single plot (ggs_histogram() and ggs_caterpillar().
ggs_histogram(D)

ggs_density(D)

ggs_traceplot(D)

ggs_running(D)

ggs_compare_partial(D)

ggs_autocorrelation(D)

ggs_crosscorrelation(D)

ggs_caterpillar(D)

Future releases of ggmcmc will include the possibility to plot only the parameters belonging to a family of parameters (as in beta[1], beta[2], ...).