RStudio

Quirks about running Rcpp on Windows through RStudio

Quirks about running Rcpp on Windows through RStudio

This is a quick note about some tribulations I had running Rcpp (v. 0.12.12) code through RStudio (v. 1.0.143) on a Windows 7 box running R (v. 3.3.2). I also have RTools v. 3.4 installed. I fully admit that this may very well be specific to my box, but I suspect not.

I kept running into problems with Rcpp complaining that (a) RTools wasn’t installed, and (b) the C++ compiler couldn’t find Rcpp.h. First, devtools::find_rtools was giving a positive result, so (a) was not true. Second, I noticed that the wrong C++ compiler was being called. Even more frustrating was the fact that everything was working if I worked on a native R console rather than RStudio. So there was nothing inherently wrong with the code or setup, but rather the environment RStudio was creating.

After some searching the interwebs and StackOverflow, the following solution worked for me. I added the following lines to my global .Rprofile file:

Sys.setenv(PATH = paste(Sys.getenv("PATH"), "C:/RBuildTools/3.4/bin/",
            "C:/RBuildTools/3.4/mingw_64/bin", sep = ";"))
Sys.setenv(BINPREF = "C:/RBuildTools/3.4/mingw_64/bin/")

Note that C:/RBuildTools is the default location suggested when I installed RTools.

This solution is indicated here, but I have the reverse issue of the default setup working in R and not in the latest RStudio. However, the solution still works!!

Note that instead of putting it in the global .Rprofile, you could put it in a project-specific .Rprofile, or even in your R code as long as it is run before loading the Rcpp or derivative packages. Note also that if you use binary packages that use Rcpp, there is no problem. Only when you’re compiling C++ code either for your own code or for building a package from source is this an issue. And, as far as I can tell, only on Windows.

Hope this prevents someone else from 3 hours of heartburn trying to make Rcpp work on a Windows box. And, if this has already been fixed in RStudio, please comment and I’ll be happy to update this post.

 

The need for documenting functions

My current work usually requires me to work on a project until we can submit a research paper, and then move on to a new project. However, 3-6 months down the road, when the reviews for the paper return, it is quite common to have to do some new analyses or re-analyses of the data. At that time, I have to re-visit my code!

One of the common problems I (and I’m sure many of us) have is that we tend to hack code and functions with the end in mind, just getting the job done. However, when we have to re-visit code when it’s no longer fresh in our memory, it takes significant time to decipher what some code snippet or function is doing, or why we did it in the first place. Then, when our paper gets published and a reader wants our code to try, it’s a bear getting it into any kind of shareable form. Recently I’ve had both issues, and decided, enough was enough!!

R has a fantastic package roxygen2 that makes documenting functions quite easy. The documentation sits just above the function code, so it is there front and center. Taking 2-5 minutes to write even a bare-bones documentation, that includes

  • what the function does
  • what are the inputs (in English) and their required R class
  • what is the output and its R class
  • maybe one example

makes the grief of re-discovering the function and trying to decipher it go away. What does this look like? Here’s a recent example from my files:

#' Find column name corresponding to a particular functional
#'
#' The original data set contains very long column headers. This function
#' does a keyword search over the headers to find those column headers that
#' match a particular keyword, e.g., mean, median, etc.
#' @param x The data we are querying (data.frame)
#' @param v The keyword we are searching for (character)
#' @param ignorecase Should case be ignored (logical)
#' @return A vector of column names matching the keyword
#' @export
findvar <- function(x,v, ignorecase=TRUE) {
  if(!is.character(v)) stop('name must be character')
  if(!is.data.frame(x)) stop('x must be a data.frame')
  v <- grep(v,names(x),value=T, ignore.case=ignorecase)
  if(length(v)==0) v <- NA
  return(v)
}

My code above might not meet best practices, but it achieves two things for me. It reminds me of why I wrote this function, and tells me what I need to run it. This particular snippet is not part of any R package (though I could, with my new directory structure for projects, easily create a project-specific package if I need to). Of course this type of documentation is required if you are indeed writing packages.  

Update: As some of you have pointed out, the way I’m using this is as a fancy form of commenting, regardless of future utility in packaging. 100% true, but it’s actually one less thing for me to think about. I have a template, fill it out, and I’m done, with all the essential elements included. Essentially this creates a “minimal viable comment” for a function, and I only need to look in one place later to see what’s going on. I still comment my code, but this still gives me value for not very much overhead.

Resources

There are several resources for learning about roxygen2. First and foremost is the chapter Documenting functions from Hadley Wickham’s online book. roxygen2 also has its own tag on StackOverflow.

On the software side, RStudio supports roxygen2; see here. Emacs/ESS also has extensive roxygen2 support. The Rtools package for Sublime Text provides a template for roxygen2 documentation. So getting started in the editor of your choice is not a problem.

RStudio 0.94.92 visited

I just updated my RStudio version to the latest, v.0.94.92 (will this asymptotically approach 1, or actually get to 1?). It was nice to see the number of improvements the development team has implemented, based I’m sure on community feedback. The team has, in my experience, been extraordinarily responsive to user feedback, and I’m sure this played a large part in the development path taken by the team. 

First and foremost, I was happy to see most of my wants met in this version:

  • There now is a keyboard shortcut for <- that is easy and intuitive (Alt+_/Option+_)
  • The File window now allows sorting by modification date  in addition to name, which was becoming an issue for one of my projects
  • Plots can be saved as BMP, TIFF, JPEG and Postscript in addition to PNG and PDF
  • Bracket completion and matching, very much similar to the R Mac GUI, and actually better than Emacs/ESS, specially when deleting.
  • An easy shortcut to repeat blocks of text or transpose two lines of text (though this appears mistakenly overloaded with another shortcut on Windows/Linux)
  • Keyboard shortcuts are reasonably consistent with OS-specific shortcuts, though the Ctrl key is used in Mac more than generally seen in the OS. It is however convenient for those of us migrating from Emacs/ESS, who use the Ctrl key often. 

My wishlist for RStudio is pretty much fulfilled with respect to R development. However, a few improvements need to be made in the TeX/Sweave interface to allow for autocompletion, templates, and fuller functionality in line with Emacs/Auctex and Texmate. Currently writing LaTeX and Sweave feels like writing in Wordpad, albeit with R-specific word completion and R functionality. This can be a bit more polished. Of course TeX and Sweave are still used by a minority of R users, so the fact that this functionality hasn’t developed is no surprise. 

All in all, the current version of RStudio feels like a very usable IDE for R, and certain features and similarities make migrating from Emacs pretty easy (provided you don’t miss Emacs’ overall power and flexibility too much)