One of the products I produce for my microarray clients is web sites which graphically and tabularly summarize the results, including annotation of the transcripts using online databases which are linked to the page. It is not unusual, after pathway analysis, to have 100s of pages. Wonder if there is a python templating framework that can make shorter order of this than I’m doing now.
Recently I have been processing Affymetrix Exon 1.0 arrays for a collaborator. Our aim is to associate gene expression with chemotherapy-related survival. For this , as an initial pass, I’m doing gene-by-gene Cox regression with genes as a single covariate. The problem using R directly is that the data file is HUGE (we’re measuring over 1M genes here). I’ve adapted my usual R-based pipeline to use python instead. Python is much faster at reading large data sets into memory, as well as looping over the data set. I use Gautier’s excellent rpy2 interfact between R and Python to do my Cox regression in R. Basically, as I loop over the genes I create a dictionary object which gets transformed to a data.frame in R, which is the namespace on which I run coxph. I then pipe back the p-value and hazard ratio to python. Finally I write the hazard ratio and p-value, as well as the probe ID, to a file (hard drive space seems cheaper operationally than memory for storage). I then use matplotlib to generate “volcano plots” for a first visualization, and also use python to summarize this derived data.
I could get through a lot of the analysis in this Python-rpy2-R pipeline in the same time it took me to load the data into R itself.