Forest plots using R and ggplot2

Forest plots are most commonly used in reporting meta-analyses, but can be profitably used to summarise the results of a fitted model. They essentially display the estimates for model parameters and their corresponding confidence intervals.

Matt Shotwell just posted a message to the R-help mailing list with his lattice-based solution to the problem of creating forest plots in R. I just figured out how to create a forest plot for a consulting report using ggplot2. The availability of the geom_pointrange layer makes this process very easy!!

Update January 26, 2016: ggplot2 has changed a bit in the last five years. I’ve created a gist that will be easier to maintain. The link is here.

credplot.gg <- function(d){
 # d is a data frame with 4 columns
 # d$x gives variable names
 # d$y gives center point
 # d$ylo gives lower limits
 # d$yhi gives upper limits
 require(ggplot2)
 p <- ggplot(d, aes(x=x, y=y, ymin=ylo, ymax=yhi))+
 geom_pointrange()+
 geom_hline(yintercept = 0, linetype=2)+
 coord_flip()+
 xlab('Variable')
 return(p)
}

If we start with some dummy data, like

d <- data.frame(x = toupper(letters[1:10]),
                y = rnorm(10, 0, 0.1))
d <- transform(d, ylo = y-1/10, yhi=y+1/10)

credplot.gg(d)

we can get the following graph:

17 comments

  1. Abhijit,

    Thanks for the mention and nice post. I think the ggplot2 solution to this is really more simple than anything we can do with lattice. Unfortunately, I’m still procrastinating about learning ggplot2 well enough for every-day use. Examples like this make it easier, I think.

    -Matt

    1. Matt,

      I learned a lot from our useR meetup talk this month by Harlan Harris (available at the DC useR meetup site, and also on Harlan’s blog). I learned lattice first and was partial to it for a long time, mainly due to its quickness compared to ggplot. However, once I realized the “logic” behind creating plots in ggplot, it seemed very easy and flexible. The speed thing still irks me, but for smallish data sets its my preferred graphics platform in R now.

      Abhijit

  2. Abhijit,

    Thanks for the excellent post. This can be especially useful in genetics. Would you care if I slightly modified the code and reposted to my own blog, citing this post of course?

    Stephen.

    1. Stephen,

      Feel free to modify and post this code. It’s pretty rough, and not production quality. It’s really proof-of-principle code, and very easy code at that.

      Abhijit

    2. Hi Steve, have you been able to code it to obtain a “diamond” for the summary of all data? I have been struggling on this for quite some time now.

    1. Well, if you use either Matt’s function or mine (mine would need some more polish, and the Getting Genetics Done blog might have a more polished version soon), it is equally easy right now, since both are presented as generic R functions, and the inner workings would be hidden to the user 🙂

      Both graphics engines have a learning curve. lattice is closer in syntactical spirit to base graphics and a lot of R’s formula and conditioning syntax in general, so it might have a more familiar feel for a newcomer. ggplot2 has syntax that is not quite so straightforward (and abuses some conventions), but it has made some sounder graphical choices and makes it easier to build customizable graphs once you understand the logic of building layers to make up a graph (in other words, the grammar). I started with lattice for many years and resisted moving to ggplot2, but now, I tend to use ggplot2 more for my quick graphing needs. If something doesn’t work in ggplot2, I will go back to lattice or base graphics.

      Deepayan Sarkar’s book on lattice is a pretty good starting point. Hadley Wickham’s book and website on ggplot2 are good references, but I’d look for online tutorials (like the DC, Bay Area and New York R meetup sites, or the Learn R blog) to get an initial start.

  3. Hi Abiji,
    Wondering if I can add a table with the Beta’s at the side of this graph? If there is any link you can provide which can be useful to add more details to Forest plot for publication?
    thanks,
    SD

  4. HI Everyone,
    The forest plot is very nice. However, I have a question: it display at most 24 variables names on the forest plot after I tried to do that. Would you please telling me how to display more than 24 variables names on forest plot?
    Thanks a lot!
    Liming

  5. Hi Abhijit,
    Thanks for the magical code. It works great! I used your and Matt’s code to do my forest plot available here: (https://www.dropbox.com/s/76im4sqo9y6qvws/Metanalysis.tiff?dl=0). Now, I want to extend my horizon a bit by including 4 parameters (represented via legends) within each study. Is it possible to achieve this by adding something to your code? Like in ggplot2 we know we have fill=parameter.

    Thanks,
    Akshay

    1. Well, 4 parameters in a plot is difficult in the best of times. My solution uses ggplot2, so all the stuff you can do in ggplot2 is applicable. All I’m really using is geom_pointrange as the geom. if you want to add fill, color, and other kinds of parameters, it should be pretty straightforward.

    2. Hi Akshay, could you kindly provide me your R code to produce this pretty forest plot. I really need it to better explain the result of a glm poisson fit.
      Really tahnks in advance for you support!

  6. Dear Abhijit,
    when triyng to run your script for the forest plot I get this error message
    “Error: geom_hline requires the following missing aesthetics: yintercept”
    Have you any idea to solve this kind of problem?
    Thanks in advance for your kindly reply.
    Daniele

  7. Thanks for the code Abhjit, saved me oodles of time in trying to figure out how to plot my OR’s and their limits.

Leave a comment