Forest plots using R and ggplot2

Forest plots are most commonly used in reporting meta-analyses, but can be profitably used to summarise the results of a fitted model. They essentially display the estimates for model parameters and their corresponding confidence intervals.

Matt Shotwell just posted a message to the R-help mailing list with his lattice-based solution to the problem of creating forest plots in R. I just figured out how to create a forest plot for a consulting report using ggplot2. The availability of the geom_pointrange layer makes this process very easy!!

credplot.gg <- function(d){
# d is a data frame with 4 columns
# d$x gives variable names
# d$y gives center point
# d$ylo gives lower limits
# d$yhi gives upper limits
    require(ggplot2)
    p <- ggplot(d, aes(x=x, y=y, ymin=ylo, ymax=yhi))+geom_pointrange()+
           coord_flip() + geom_hline(aes(x=0), lty=2)+ xlab('Variable')
    return(p)
}

If we start with some dummy data, like

d <- data.frame(x = toupper(letters[1:10]),
                y = rnorm(10, 0, 0.1))
d <- transform(d, ylo = y-1/10, yhi=y+1/10)

credplot.gg(d)

we can get the following graph:

About these ads

9 comments

  1. Abhijit,

    Thanks for the mention and nice post. I think the ggplot2 solution to this is really more simple than anything we can do with lattice. Unfortunately, I’m still procrastinating about learning ggplot2 well enough for every-day use. Examples like this make it easier, I think.

    -Matt

    1. Matt,

      I learned a lot from our useR meetup talk this month by Harlan Harris (available at the DC useR meetup site, and also on Harlan’s blog). I learned lattice first and was partial to it for a long time, mainly due to its quickness compared to ggplot. However, once I realized the “logic” behind creating plots in ggplot, it seemed very easy and flexible. The speed thing still irks me, but for smallish data sets its my preferred graphics platform in R now.

      Abhijit

  2. Abhijit,

    Thanks for the excellent post. This can be especially useful in genetics. Would you care if I slightly modified the code and reposted to my own blog, citing this post of course?

    Stephen.

    1. Stephen,

      Feel free to modify and post this code. It’s pretty rough, and not production quality. It’s really proof-of-principle code, and very easy code at that.

      Abhijit

    1. Well, if you use either Matt’s function or mine (mine would need some more polish, and the Getting Genetics Done blog might have a more polished version soon), it is equally easy right now, since both are presented as generic R functions, and the inner workings would be hidden to the user :)

      Both graphics engines have a learning curve. lattice is closer in syntactical spirit to base graphics and a lot of R’s formula and conditioning syntax in general, so it might have a more familiar feel for a newcomer. ggplot2 has syntax that is not quite so straightforward (and abuses some conventions), but it has made some sounder graphical choices and makes it easier to build customizable graphs once you understand the logic of building layers to make up a graph (in other words, the grammar). I started with lattice for many years and resisted moving to ggplot2, but now, I tend to use ggplot2 more for my quick graphing needs. If something doesn’t work in ggplot2, I will go back to lattice or base graphics.

      Deepayan Sarkar’s book on lattice is a pretty good starting point. Hadley Wickham’s book and website on ggplot2 are good references, but I’d look for online tutorials (like the DC, Bay Area and New York R meetup sites, or the Learn R blog) to get an initial start.

  3. Hi Abiji,
    Wondering if I can add a table with the Beta’s at the side of this graph? If there is any link you can provide which can be useful to add more details to Forest plot for publication?
    thanks,
    SD

  4. HI Everyone,
    The forest plot is very nice. However, I have a question: it display at most 24 variables names on the forest plot after I tried to do that. Would you please telling me how to display more than 24 variables names on forest plot?
    Thanks a lot!
    Liming

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s