Python and Excel

Excel is unfortunately the lingua franca of data delivery (at least in small amounts) from my collaborators. Often I have to merge several disparate bits of information from several Excel files together.  I used to do this using R, since that’s what I’ve known for many years.

Now, of course, I’ve discovered Python!!! I fortunately discovered the excellent xlrd and xlwt packages by John Machin, and the subsequent addition of the xlutils package. You can find links for these at the newly created python-excel.com.  I found it was quite easy to read data from Excel files using xlrd, and then create Python dicts using the common ID as the keys. Merging was then merely appending, or rather extending, the appropriate lists based on the common ID using a loop or list comprehension. The new list of lists could then be easily written out to a new Excel file using xlwt. I suppose I could also convert the list of lists to a Numpy array and then directly save it into R, but that would actually require advanced thinking, now, wouldn’t it.

My other joyful experience today using Python was in a similar vein, but using the powerful and excellent re module. I’ve found that the re module is easier and appears more powerful than R”s grep tools, though both are based on regular expressions. I had a grid which had scores for a biomarker, which mimiced the actual grid the pathologist was reading from. Of course this is not the kind of data structure R likes, per se, so I tried to get the IDs and scores out using Python. It turned out that 10 lines of code utilizing powerful grouping tools in regular expressions enabled me to parse this grid-like data quickly into a dict with IDs as keys and scores as data. This was then easily merged to other clinical data on the patients to create a spreadsheet. I then used xlwt to write this out to a fresh Excel file for archiving as well as for R to read. The development of this code took maybe 3o minutes!!! Again I’m impressed by Python’s power.

Now if only I can find an easy and reliable way to convert LaTeX tables and figures into OpenOffice or Word while preserving the integrity of the document.

Advertisements

3 comments

  1. I don’t want to be spammy, but did you take a look at Resolver One? It can read Excel files, and then you can use all of the Python modules to manipulate them in the grid, then save to Excel or CSV. One of our users has managed to hook it up to R, too.

    Cheers,

    Giles

    1. Thanks for the comment. I haven’t looked at Resolver One. I’ve been pretty happy with xlrd and xlwt for a while, so I’d have to see the advantages of Resolver One over my basic pathway.

      1. xlrd and xlwt are fantastic tools, and it may well be that they’re all you need.

        The biggest advantage you might see with Resoilver One probably be that you could interact with your spreadsheet using Python in a spreadsheet-like environment. Resolver One is a spreadsheet, like Excel, except that:

        * It uses Python as a scripting language, so you can write both your formulae and the macros in Python.

        * It integrates the formulae more tightly with the code than Excel — for example, when you enter a formula, it gets converted into Python code in realtime so that you can see the equivalent code. The “macros” you write take the form of extra code that you can add around the formula code.

        Hope that all makes sense!

        Cheers,

        Giles

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s