Stat Bandit

Musings on statistics, computation and data research

SAS, R and categorical variables

One of the disappointing problems in SAS (as I need PROC MIXED for some analysis) is to recode categorical variables to have a particular reference category. In R, my usual tool, this is rather easy both to set and to modify using the ¬†relevel command available in base R (in the stats package). My understanding is that this is actually easy in SAS for GLM, PHREG and some others, but not in PROC MIXED. (Once again I face my pet peeve about the inconsistencies within a leading commercial product and market “leader” like SAS). The easiest way to deal with this, I believe, is to actually create the dummy variables by hand using ifelse statements and use them in the model rather than the categorical variables themselves. If most of the covariates are not categorical, this isn’t too burdensome.

I’m sure some SAS guru will comment on the elegant or “right” solution to this problem.

About these ads

3 responses to “SAS, R and categorical variables

  1. Ken July 13, 2011 at 8:35 PM

    We cover this here:

    My understanding is that different procedures are the responsibilities of different groups at SAS– not unlike the way that some very important methods in R are developed by different groups. In any sufficiently large enterprise, it’s probably impossible to ensure a uniform approach.

  2. Abhijit July 13, 2011 at 11:05 PM


    FIrst of all, kudos for your and Nick’s wonderful blog.

    That is quite a valid point, specially in a huge enterprise like SAS. Even R has it’s idiosyncrasies, as you well know. However, manipulating categorical variables is a pretty fundamental data management task. PROC MIXED itself is good and popular, even among the non-SAS-philes, and so why such a fundamental data manipulation would be ignored in a very popular PROC befundles me.

  3. kenkleinman May 17, 2012 at 1:23 PM

    Thanks for those kind words. I completely agree with you. There are a few procs which have a sensible (and fairly broad) set of options for parameterizing categorical variables, and this ough tot be adopted by all procs, IMO. OTOH, it wasn’t too long ago that there was no class statement for logistic regression– all categorical variables had to be recoded by hand. So– progress may be slow, especially when the code is not written by volunteers, but it does come eventually.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 391 other followers

%d bloggers like this: