29 comments on “An enhanced Kaplan-Meier plot

  1. Re: the p-value annotation, you could simply use annotate() in ggplot,

    qplot(1, 1) + annotate(“text”, x=0.8, y=0.8, label=”testing”)

    in which case you don’t need to convert the ggplot to a grob before passing it to grid.arrange().

    Also, note that grid.draw(p) would be a better choice than grid.arrange(p) when there’s only one grob to place.

    Finally, you could perhaps use grid.table() to draw the table rather than ggplot, unless the alignment of the x axes is really important (at first sight i didn’t see why it should be).

    • Baptiste,

      I was hoping for comments from experienced hands. I never discovered annotate, so thanks.

      The alignment is actually present in almost all the examples I’ve seen, so yes it is important to my collaborator. I agree that it is not necessary.

      Thank you for taking the time to comment.

      Abhijit

      • After I posted I realised that the alignment could be useful, indeed. I don’t see why you use blank_pic however, rather than just setting the plot margin for instance, or even an empty rectGrob(). I couldn’t run the example above for some reason (presumably because df is not defined for blank_pic, and also fit is not a data.frame).

        I have a comment regarding the general structure of the function with if() else blocks. I’d suggest the following,

        ## create the main plot
        p = ggplot(…) …

        ## add the optional p value
        if(pvalue)
        p = p + annotate(…)

        ## store the plot in a list
        l = list(p)

        ## optional: create the table and append it to the list

        if(table) {
        gt = ggplot(…)
        l = c(list(gt), l)
        }

        ## finally, plot and/or return

        plots_and_options = c(l, list(clip=F, nrow=3, ncol=1,
        heights = unit(c(2, .1, .25), c(‘null’,’null’,’null’))))

        all = do.call(arrangeGrob, plots_and_options)

        if(plot)
        grid.draw(l)

        if(return)
        return(invisible(l))

        (all this untested)

        HTH,

        baptiste

      • Baptiste,

        I appreciate your comments and effort in this.

        I understand your suggested structure, which has the distinct advantage of being more modular. Modifications to my program are forthcoming:)

        Didn’t know about rectGrob(). Thanks

        fit is a survival object, the result of survival:::survfit. If you run the 3-line example in the blog (just above the code) it should work.

        Thanks again.

  2. You probably don’t need rectGrob(),

    qplot(1,1) + opts(plot.margin = unit(c(1,1,10,1), “lines”))

    I haven’t been able to run the code above; perhaps you should try it in a fresh vanilla session.

    HTH

  3. Thank you for the good work Abhijit. I tried to run the codes above. When I submitted >ggkm(fit,timeby=500) I’ve got an error message “Error: ggplot2 doesn’t know how to deal with data of class function”. Can you explain why? That will certainly help me understand ggplot a bit more. Thanks!!!

    • Kelvin, that’s because df happens to be a function of the stats package in R (the density function of the F distribution). See:
      ?df
      I also had to tweak the code a bit so it would produce the intended result in my machine:
      ggkm <- function (sfit,
      table = TRUE,
      returns = FALSE,
      xlabs = "Time",
      ylabs = "survival probability",
      ystratalabs = NULL,
      ystrataname = NULL,
      timeby = 100,
      main = "Kaplan-Meier Plot",
      pval = FALSE,
      …) {

      require (ggplot2)
      require (survival)
      require (gridExtra)

      ## Create Kaplan Meier plot
      if (is.null (ystratalabs)) {
      ystratalabs <- as.character (levels (summary (sfit)$strata))
      }
      (m <- max (nchar (ystratalabs)))
      if (is.null (ystrataname)) ystrataname <- "Strata"
      times <- seq (0, max (sfit$time), by = timeby)
      .df <- data.frame (time = sfit$time,
      n.risk = sfit$n.risk,
      n.event = sfit$n.event,
      surv = sfit$surv,
      strata = summary (sfit, censored = T)$strata,
      upper = sfit$upper,
      lower = sfit$lower)
      levels (.df$strata) <- ystratalabs
      zeros <- data.frame (time = 0,
      surv = 1,
      strata = ystratalabs,
      upper = 1,
      lower = 1)
      .df <- rbind.fill (zeros, .df)
      d <- length (levels (.df$strata))
      p <- ggplot (.df, aes (time, surv, groups = strata)) +
      geom_step (aes (linetype = strata),
      size = 0.7) +
      theme_bw () +
      opts (axis.title.x = theme_text (vjust = 0.5)) +
      scale_x_continuous (xlabs,
      breaks = times,
      limits = c (0, max (sfit$time))) +
      scale_y_continuous (ylabs, limits = c (0, 1)) +
      opts (panel.grid.minor = theme_blank ()) +
      opts (legend.position = c (ifelse (m < 10, .28, .35),
      ifelse (d < 4, .25, .35))) +
      opts (legend.key = theme_rect (colour = NA)) +
      labs (linetype = ystrataname) +
      opts (plot.margin = unit (c (0, 1, .5, ifelse (m < 10, 1.5, 2.5)),
      "lines")) +
      opts (title = main)

      ## Create a blank plot for place-holding
      ## .df <- data.frame ()
      blank.pic <- ggplot (.df, aes (time, surv)) +
      geom_blank () +
      theme_bw () +
      opts (axis.text.x = theme_blank (),
      axis.text.y = theme_blank (),
      axis.title.x = theme_blank (),
      axis.title.y = theme_blank (),
      axis.ticks = theme_blank (),
      panel.grid.major = theme_blank (),
      panel.border = theme_blank ())

      if (pval) {
      sdiff <- survdiff (eval (sfit$call$formula),
      data = eval (sfit$call$data))
      pval <- pchisq (sdiff$chisq,
      length (sdiff$n) – 1,
      lower.tail = FALSE)
      pvaltxt <- ifelse (pval < 0.0001, "p < 0.0001",
      paste ("p =", signif (pval, 3)))
      p <- p +
      annotate ("text",
      x = 0.6 * max (sfit$time),
      y = 0.1,
      label = pvaltxt)
      }

      if (table) {
      ## Create table graphic to include at-risk numbers
      risk.data <- data.frame (strata = summary (sfit,
      times = times,
      extend = TRUE)$strata,
      time = summary (sfit,
      times = times,
      extend = TRUE)$time,
      n.risk = summary (sfit,
      times = times,
      extend = TRUE)$n.risk)
      data.table <- ggplot (risk.data,
      aes (x = time,
      y = strata,
      label = format (n.risk, nsmall = 0))) + #, color = strata)) +
      geom_text (size = 3.5) +
      theme_bw () +
      scale_y_discrete (breaks = as.character (levels (risk.data$strata)),
      labels = ystratalabs) +
      # scale_y_discrete (#format1ter = abbreviate,
      # breaks = 1:3,
      # labels = ystratalabs) +
      scale_x_continuous ("Numbers at risk",
      limits = c (0, max (sfit$time))) +
      opts (axis.title.x = theme_text (size = 10, vjust = 1),
      panel.grid.major = theme_blank (),
      panel.grid.minor = theme_blank (),
      panel.border = theme_blank (),
      axis.text.x = theme_blank (),
      axis.ticks = theme_blank (),
      axis.text.y = theme_text (face = "bold",
      hjust = 1))

      data.table <- data.table +
      opts (legend.position = "none") +
      xlab (NULL) +
      ylab (NULL)

      data.table <- data.table +
      opts (plot.margin = unit (c (-1.5, 1, 0.1, ifelse (m < 10, 2.5, 3.5) – 0.28 * m), "lines"))

      ## Plotting the graphs
      ## p <- ggplotGrob (p)
      ## p <- addGrob (p, textGrob (x = unit (.8, "npc"), y = unit (.25, "npc"), label = pvaltxt,
      ## gp = gpar (fontsize = 12)))
      grid.arrange (p,
      blank.pic,
      data.table,
      clip = FALSE,
      nrow = 3,
      ncol = 1,
      heights = unit (c (2, .1, .25),
      c ("null", "null", "null")))
      if (returns) {
      a <- arrangeGrob (p,
      blank.pic,
      data.table,
      clip = FALSE,
      nrow = 3,
      ncol = 1,
      heights = unit (c (2, .1, .25),
      c ("null", "null", "null")))
      return (a)
      }
      }
      else {
      ## p <- ggplotGrob (p)
      ## p <- addGrob (p, textGrob (x = unit (0.5, "npc"), y = unit (0.23, "npc"),
      ## label = pvaltxt, gp = gpar (fontsize = 12)))
      print (p)
      if (returns) return (p)
      }
      }

    • Kelvin, that’s because “df” *happens* to be a function of the stats package in R (the density function of the F distribution). See:
      ?df
      I also had to tweak the code a bit so it would produce the intended result in my machine:
      ggkm <- function (sfit,
      table = TRUE,
      returns = FALSE,
      xlabs = "Time",
      ylabs = "survival probability",
      ystratalabs = NULL,
      ystrataname = NULL,
      timeby = 100,
      main = "Kaplan-Meier Plot",
      pval = FALSE,
      …) {

      require (ggplot2)
      require (survival)
      require (gridExtra)

      ## Create Kaplan Meier plot
      if (is.null (ystratalabs)) {
      ystratalabs <- as.character (levels (summary (sfit)$strata))
      }
      (m <- max (nchar (ystratalabs)))
      if (is.null (ystrataname)) ystrataname <- "Strata"
      times <- seq (0, max (sfit$time), by = timeby)
      .df <- data.frame (time = sfit$time,
      n.risk = sfit$n.risk,
      n.event = sfit$n.event,
      surv = sfit$surv,
      strata = summary (sfit, censored = T)$strata,
      upper = sfit$upper,
      lower = sfit$lower)
      levels (.df$strata) <- ystratalabs
      zeros <- data.frame (time = 0,
      surv = 1,
      strata = ystratalabs,
      upper = 1,
      lower = 1)
      .df <- rbind.fill (zeros, .df)
      d <- length (levels (.df$strata))
      p <- ggplot (.df, aes (time, surv, groups = strata)) +
      geom_step (aes (linetype = strata),
      size = 0.7) +
      theme_bw () +
      opts (axis.title.x = theme_text (vjust = 0.5)) +
      scale_x_continuous (xlabs,
      breaks = times,
      limits = c (0, max (sfit$time))) +
      scale_y_continuous (ylabs, limits = c (0, 1)) +
      opts (panel.grid.minor = theme_blank ()) +
      opts (legend.position = c (ifelse (m < 10, .28, .35),
      ifelse (d < 4, .25, .35))) +
      opts (legend.key = theme_rect (colour = NA)) +
      labs (linetype = ystrataname) +
      opts (plot.margin = unit (c (0, 1, .5, ifelse (m < 10, 1.5, 2.5)),
      "lines")) +
      opts (title = main)

      ## Create a blank plot for place-holding
      ## .df <- data.frame ()
      blank.pic <- ggplot (.df, aes (time, surv)) +
      geom_blank () +
      theme_bw () +
      opts (axis.text.x = theme_blank (),
      axis.text.y = theme_blank (),
      axis.title.x = theme_blank (),
      axis.title.y = theme_blank (),
      axis.ticks = theme_blank (),
      panel.grid.major = theme_blank (),
      panel.border = theme_blank ())

      if (pval) {
      sdiff <- survdiff (eval (sfit$call$formula),
      data = eval (sfit$call$data))
      pval <- pchisq (sdiff$chisq,
      length (sdiff$n) – 1,
      lower.tail = FALSE)
      pvaltxt <- ifelse (pval < 0.0001, "p < 0.0001",
      paste ("p =", signif (pval, 3)))
      p <- p +
      annotate ("text",
      x = 0.6 * max (sfit$time),
      y = 0.1,
      label = pvaltxt)
      }

      if (table) {
      ## Create table graphic to include at-risk numbers
      risk.data <- data.frame (strata = summary (sfit,
      times = times,
      extend = TRUE)$strata,
      time = summary (sfit,
      times = times,
      extend = TRUE)$time,
      n.risk = summary (sfit,
      times = times,
      extend = TRUE)$n.risk)
      data.table <- ggplot (risk.data,
      aes (x = time,
      y = strata,
      label = format (n.risk, nsmall = 0))) + #, color = strata)) +
      geom_text (size = 3.5) +
      theme_bw () +
      scale_y_discrete (breaks = as.character (levels (risk.data$strata)),
      labels = ystratalabs) +
      # scale_y_discrete (#format1ter = abbreviate,
      # breaks = 1:3,
      # labels = ystratalabs) +
      scale_x_continuous ("Numbers at risk",
      limits = c (0, max (sfit$time))) +
      opts (axis.title.x = theme_text (size = 10, vjust = 1),
      panel.grid.major = theme_blank (),
      panel.grid.minor = theme_blank (),
      panel.border = theme_blank (),
      axis.text.x = theme_blank (),
      axis.ticks = theme_blank (),
      axis.text.y = theme_text (face = "bold",
      hjust = 1))

      data.table <- data.table +
      opts (legend.position = "none") +
      xlab (NULL) +
      ylab (NULL)

      data.table <- data.table +
      opts (plot.margin = unit (c (-1.5, 1, 0.1, ifelse (m < 10, 2.5, 3.5) – 0.28 * m), "lines"))

      ## Plotting the graphs
      ## p <- ggplotGrob (p)
      ## p <- addGrob (p, textGrob (x = unit (.8, "npc"), y = unit (.25, "npc"), label = pvaltxt,
      ## gp = gpar (fontsize = 12)))
      grid.arrange (p,
      blank.pic,
      data.table,
      clip = FALSE,
      nrow = 3,
      ncol = 1,
      heights = unit (c (2, .1, .25),
      c ("null", "null", "null")))
      if (returns) {
      a <- arrangeGrob (p,
      blank.pic,
      data.table,
      clip = FALSE,
      nrow = 3,
      ncol = 1,
      heights = unit (c (2, .1, .25),
      c ("null", "null", "null")))
      return (a)
      }
      }
      else {
      ## p <- ggplotGrob (p)
      ## p <- addGrob (p, textGrob (x = unit (0.5, "npc"), y = unit (0.23, "npc"),
      ## label = pvaltxt, gp = gpar (fontsize = 12)))
      print (p)
      if (returns) return (p)
      }
      }

  4. nice code, thanks a lot.
    @Rafik, did you set pval=TRUE
    Based on Gil Thomas’ code, using the example given, the line legend had 6 strata: 1, 2, 3, rx=Lev, rx=Lev+5FU, rx=Obs
    I traced this down to a bug in the creation of zeros data.frame, fixed in the below + added roxygen comments.

    #’ Create a Kaplan-Meier plot using ggplots2
    #’
    #’ @param sfit a \code{\link[survival]{survfit}} object
    #’ @param table logical: Create a table graphic below the K-M plot, indicating at-risk numbers?
    #’ @param returns logical: if \code{TRUE}, return an arrangeGrob object
    #’ @param xlabs x-axis label
    #’ @param ylabs y-axis label
    #’ @param ystratalabs The strata labels. \code{Default = levels(summary(sfit)$strata)}
    #’ @param ystrataname The legend name. Default = “Strata”
    #’ @param timeby numeric: control the granularity along the time-axis
    #’ @param main plot title
    #’ @param pval logical: add the pvalue to the plot?
    #’ @return a ggplot is made. if return=TRUE, then an arrangeGlob object
    #’ is returned
    #’ @author Abhijit Dasgupta with contributions by Gil Tomas
    #’ \url{http://statbandit.wordpress.com/2011/03/08/an-enhanced-kaplan-meier-plot/#comment-55}
    #’ @export
    #’ @examples
    #’ \dontrun{
    #’ data(colon)
    #’ fit <- survfit(Surv(time,status)~rx, data=colon)
    #' ggkm(fit, timeby=500)
    #' }
    ggkm <- function(sfit,
    table = TRUE,
    returns = FALSE,
    xlabs = "Time",
    ylabs = "survival probability",
    ystratalabs = NULL,
    ystrataname = NULL,
    timeby = 100,
    main = "Kaplan-Meier Plot",
    pval = TRUE,
    …) {

    require(ggplot2)
    require(survival)
    require(gridExtra)

    if(is.null(ystratalabs)) {
    ystratalabs <- as.character(levels(summary(sfit)$strata))
    }
    (m <- max(nchar(ystratalabs)))
    if(is.null(ystrataname)) ystrataname <- "Strata"
    times <- seq(0, max(sfit$time), by = timeby)
    .df <- data.frame(
    time = sfit$time,
    n.risk = sfit$n.risk,
    n.event = sfit$n.event,
    surv = sfit$surv,
    strata = summary(sfit, censored = T)$strata,
    upper = sfit$upper,
    lower = sfit$lower
    )
    levels(.df$strata) <- ystratalabs
    zeros <- data.frame(time = 0,
    surv = 1,
    strata = factor(ystratalabs, levels=levels(.df$strata)),
    upper = 1,
    lower = 1
    )
    .df <- rbind.fill(zeros, .df)
    d <- length(levels(.df$strata))
    p <- ggplot(
    .df,
    aes(time, surv, groups = strata)
    ) +
    geom_step(aes(linetype = strata), size = 0.7) +
    theme_bw() +
    opts(axis.title.x = theme_text(vjust = 0.5)) +
    scale_x_continuous(
    xlabs,
    breaks = times,
    limits = c(0, max(sfit$time))
    ) +
    scale_y_continuous(
    ylabs,
    limits = c(0, 1)
    ) +
    opts(panel.grid.minor = theme_blank()) +
    opts(
    legend.position = c(
    ifelse(m < 10, .28, .35),
    ifelse(d < 4, .25, .35)
    )
    ) +
    opts(legend.key = theme_rect(colour = NA)) +
    labs(linetype = ystrataname) +
    opts(
    plot.margin = unit(
    c(0, 1, .5,
    ifelse(m < 10, 1.5, 2.5)
    ),
    "lines"
    )
    ) +
    opts(title = main)

    ## Create a blank plot for place-holding
    ## .df <- data.frame()
    blank.pic <- ggplot(.df, aes(time, surv)) +
    geom_blank() +
    theme_bw() +
    opts(
    axis.text.x = theme_blank(),
    axis.text.y = theme_blank(),
    axis.title.x = theme_blank(),
    axis.title.y = theme_blank(),
    axis.ticks = theme_blank(),
    panel.grid.major = theme_blank(),
    panel.border = theme_blank()
    )

    if(pval) {
    sdiff <- survdiff(eval(sfit$call$formula),
    data = eval(sfit$call$data))
    pval <- pchisq(
    sdiff$chisq,
    length(sdiff$n) – 1,
    lower.tail = FALSE
    )
    pvaltxt <- ifelse(
    pval < 0.0001,
    "p < 0.0001",
    paste("p =", signif(pval, 3))
    )
    p <- p + annotate("text",
    x = 0.6 * max(sfit$time),
    y = 0.1,
    label = pvaltxt
    )
    }

    if(table) {
    ## Create table graphic to include at-risk numbers
    risk.data <- data.frame(
    strata = summary(
    sfit,
    times = times,
    extend = TRUE
    )$strata,
    time = summary(
    sfit,
    times = times,
    extend = TRUE
    )$time,
    n.risk = summary(
    sfit,
    times = times,
    extend = TRUE
    )$n.risk
    )
    data.table <- ggplot(
    risk.data,
    aes(x = time, y = strata, label = format(n.risk, nsmall = 0))
    ) + #, color = strata)) +
    geom_text(size = 3.5) +
    theme_bw() +
    scale_y_discrete(
    breaks = as.character(levels(risk.data$strata)),
    labels = ystratalabs
    ) +
    # scale_y_discrete(#format1ter = abbreviate,
    # breaks = 1:3,
    # labels = ystratalabs) +
    scale_x_continuous(
    "Numbers at risk", limits = c(0, max(sfit$time))
    ) +
    opts(
    axis.title.x = theme_text(size = 10, vjust = 1),
    panel.grid.major = theme_blank(),
    panel.grid.minor = theme_blank(),
    panel.border = theme_blank(),
    axis.text.x = theme_blank(),
    axis.ticks = theme_blank(),
    axis.text.y = theme_text(
    face = "bold",
    hjust = 1
    )
    )

    data.table <- data.table +
    opts(legend.position = "none") +
    xlab(NULL) +
    ylab(NULL)

    data.table <- data.table +
    opts(
    plot.margin = unit(c(-1.5, 1, 0.1, ifelse(m < 10, 2.5, 3.5) – 0.28 * m), "lines")
    )

    ## Plotting the graphs
    ## p <- ggplotGrob(p)
    ## p <- addGrob(p, textGrob(x = unit(.8, "npc"), y = unit(.25, "npc"), label = pvaltxt,
    ## gp = gpar(fontsize = 12)))
    grid.arrange(
    p,
    blank.pic,
    data.table,
    clip = FALSE,
    nrow = 3,
    ncol = 1,
    heights = unit(
    c(2, .1, .25),
    c("null", "null", "null")
    )
    )
    if(returns) {
    a <- arrangeGrob(
    p,
    blank.pic,
    data.table,
    clip = FALSE,
    nrow = 3,
    ncol = 1,
    heights = unit(c(2, .1, .25),
    c("null", "null", "null"))
    )
    return(a)
    }
    }
    else {
    ## p <- ggplotGrob(p)
    ## p <- addGrob(p, textGrob(x = unit(0.5, "npc"), y = unit(0.23, "npc"),
    ## label = pvaltxt, gp = gpar(fontsize = 12)))
    print(p)
    if(returns) return(p)
    }
    }

    • @Mark Crowley: Thank you for polishing this up. I’m quite happy that this code (and the modifications by Gil and yourself) have been very useful to the R community. I’ll modify the main post later tonight to reflect the changes. Thanks specially for the roxygen preamble.

  5. Adding:

    geom_point(shape=3,aes(size=(!is.na(n.event) & n.event==0), linetype = strata)) +
    scale_size_manual(values = c(NA,2), legend = F) +

    to the ggplot function line adds the censoring points. This is messy and ends up introducing warnings for all data that isn’t censored e.g.
    “Removed 844 rows containing missing values (geom_point)”

    I couldn’t work out anyway to only plot the censored points

  6. When I run the script I have an error:
    + if(pval) {
    + sdiff <- survdiff(eval(sfit$call$formula), data = eval(sfit$call$data))
    + pval <- pchisq(sdiff$chisq, length(sdiff$n) – 1, lower.tail = FALSE )
    Erreur : entrée inattendu(e) dans :
    " sdiff <- survdiff(eval(sfit$call$formula), data = eval(sfit$call$data))
    pval pvaltxt <- ifelse(pval < 0.0001, "p < 0.0001", paste("p =", signif(pval, 3)))
    Erreur dans ifelse(pval < 1e-04, "p p }
    Erreur : ‘}’ inattendu(e) dans “}”
    and cannot see where it comes from.

    Any help appreciated. Thanks JL (abitbol{at}sent.com)

    • Very nice. R does have several approaches for doing the same thing, so the more the merrier. Like some of your other code too :)

  7. Hi guys,
    Cool stuff happening here! I have a question!
    If in one of sruvivil fit groups the event is not happening , your code returnes with error !
    Here is the error
    Error in `levels<-.factor`(`*tmp*`, value = "age2group=< 16 years") :
    number of levels differs
    How can I fix this.
    Thanks

  8. Hi Abhijit

    Nice work, however for some reason the code in the box above has an error. On line 046 the aesthetics of ggplot are presently … ggplot(.df, aes( time, surv, groups = strata)) + … this should read …ggplot(.df, aes( time, surv, group = strata)) +…

    With that it works really nicely – thanks

    Mark

  9. Great work! nevertheless, if the data has no strata, your function gives an error. for instance:

    data(colon)

    fit <- survfit(Surv(time,status)~1, data=colon)

    ggkm(fit, timeby=500)

    would give you :

    Error in data.frame(time = sfit$time[subs2], n.risk = sfit$n.risk[subs2], :
    arguments imply differing number of rows: 1, 0
    In addition: Warning message:
    In max(nchar(ystratalabs)) :
    no non-missing arguments to max; returning -Inf

    I believe an if statement would be the yeoman solution for this.

  10. Hey,

    After using the ggkm function, I get this error:
    Error in `[.data.frame`(data, “group”) : undefined columns selected

    My data consists of two groups, and the plot() function in R looks good.

    How can I fix this error?

    Thanks in advance.
    Petra

  11. Pingback: Kaplan-Meier plots using ggplots2 (updated) | Stat Bandit

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s