Forest plots are most commonly used in reporting meta-analyses, but can be profitably used to summarise the results of a fitted model. They essentially display the estimates for model parameters and their corresponding confidence intervals.
Matt Shotwell just posted a message to the R-help mailing list with his lattice-based solution to the problem of creating forest plots in R. I just figured out how to create a forest plot for a consulting report using ggplot2. The availability of the geom_pointrange
layer makes this process very easy!!
Update January 26, 2016: ggplot2 has changed a bit in the last five years. I’ve created a gist that will be easier to maintain. The link is here.
credplot.gg <- function(d){ # d is a data frame with 4 columns # d$x gives variable names # d$y gives center point # d$ylo gives lower limits # d$yhi gives upper limits require(ggplot2) p <- ggplot(d, aes(x=x, y=y, ymin=ylo, ymax=yhi))+ geom_pointrange()+ geom_hline(yintercept = 0, linetype=2)+ coord_flip()+ xlab('Variable') return(p) }
If we start with some dummy data, like
d <- data.frame(x = toupper(letters[1:10]), y = rnorm(10, 0, 0.1)) d <- transform(d, ylo = y-1/10, yhi=y+1/10) credplot.gg(d)
Abhijit,
Thanks for the mention and nice post. I think the ggplot2 solution to this is really more simple than anything we can do with lattice. Unfortunately, I’m still procrastinating about learning ggplot2 well enough for every-day use. Examples like this make it easier, I think.
-Matt
Matt,
I learned a lot from our useR meetup talk this month by Harlan Harris (available at the DC useR meetup site, and also on Harlan’s blog). I learned lattice first and was partial to it for a long time, mainly due to its quickness compared to ggplot. However, once I realized the “logic” behind creating plots in ggplot, it seemed very easy and flexible. The speed thing still irks me, but for smallish data sets its my preferred graphics platform in R now.
Abhijit
Abhijit,
Thanks for the excellent post. This can be especially useful in genetics. Would you care if I slightly modified the code and reposted to my own blog, citing this post of course?
Stephen.
Stephen,
Feel free to modify and post this code. It’s pretty rough, and not production quality. It’s really proof-of-principle code, and very easy code at that.
Abhijit
Hi Steve, have you been able to code it to obtain a “diamond” for the summary of all data? I have been struggling on this for quite some time now.
Looks great, but which is easier for someone who’s not versed in either package?
Well, if you use either Matt’s function or mine (mine would need some more polish, and the Getting Genetics Done blog might have a more polished version soon), it is equally easy right now, since both are presented as generic R functions, and the inner workings would be hidden to the user 🙂
Both graphics engines have a learning curve.
lattice
is closer in syntactical spirit to base graphics and a lot of R’s formula and conditioning syntax in general, so it might have a more familiar feel for a newcomer.ggplot2
has syntax that is not quite so straightforward (and abuses some conventions), but it has made some sounder graphical choices and makes it easier to build customizable graphs once you understand the logic of building layers to make up a graph (in other words, the grammar). I started withlattice
for many years and resisted moving toggplot2
, but now, I tend to useggplot2
more for my quick graphing needs. If something doesn’t work inggplot2
, I will go back to lattice or base graphics.Deepayan Sarkar’s book on
lattice
is a pretty good starting point. Hadley Wickham’s book and website onggplot2
are good references, but I’d look for online tutorials (like the DC, Bay Area and New York R meetup sites, or the Learn R blog) to get an initial start.Hi Abiji,
Wondering if I can add a table with the Beta’s at the side of this graph? If there is any link you can provide which can be useful to add more details to Forest plot for publication?
thanks,
SD
HI Everyone,
The forest plot is very nice. However, I have a question: it display at most 24 variables names on the forest plot after I tried to do that. Would you please telling me how to display more than 24 variables names on forest plot?
Thanks a lot!
Liming
Hi Abhijit,
Thanks for the magical code. It works great! I used your and Matt’s code to do my forest plot available here: (https://www.dropbox.com/s/76im4sqo9y6qvws/Metanalysis.tiff?dl=0). Now, I want to extend my horizon a bit by including 4 parameters (represented via legends) within each study. Is it possible to achieve this by adding something to your code? Like in ggplot2 we know we have fill=parameter.
Thanks,
Akshay
Well, 4 parameters in a plot is difficult in the best of times. My solution uses ggplot2, so all the stuff you can do in ggplot2 is applicable. All I’m really using is geom_pointrange as the geom. if you want to add fill, color, and other kinds of parameters, it should be pretty straightforward.
Hi Akshay, could you kindly provide me your R code to produce this pretty forest plot. I really need it to better explain the result of a glm poisson fit.
Really tahnks in advance for you support!
Hi Daniele, Thanks for your kind words. You can find my code here:https://gist.github.com/akshaycuhk/01576c57149a9a3d14514c9a3c4b4b1d
Basically, this is a combination of the very cool code provided by Abhijit and apatheme added to it that I found here: https://sakaluk.wordpress.com/2016/02/16/7-make-it-pretty-plots-for-meta-analysis/
Dear Abhijit,
when triyng to run your script for the forest plot I get this error message
“Error: geom_hline requires the following missing aesthetics: yintercept”
Have you any idea to solve this kind of problem?
Thanks in advance for your kindly reply.
Daniele
Hi Danielle,
If you look at the link to the gist, it has updated code that works with the updated ggplot2 library. In case you can’t find it above, it’s https://gist.github.com/webbedfeet/7031404fc3f500f6258e
Thanks for the code Abhjit, saved me oodles of time in trying to figure out how to plot my OR’s and their limits.