pandas “transform” using the tidyverse

Chris Moffit has a nice blog on how to use the transform function in pandas. He provides some (fake) data on sales and asks the question of what fraction of each order is from each SKU.

Being a R nut and a tidyverse fan, I thought to compare and contrast the code for the pandas version with an implementation using the tidyverse.

First the pandas code:

import pandas as pd
dat = pd.read_excel('sales_transactions.xlsx')
dat['Percent_of_Order'] = dat['ext price']/dat.groupby('order')['ext price'].transform('sum')

A similar implementation using the tidyverse:

dat <- read_excel('sales_transactions.xlsx')
dat <- dat %>%
group_by(order) %>%
mutate(Percent_of_Order = `ext price`/sum(`ext price`))


  1. Hey, nice comparison. I have question about %>% operator. It is mistake (instead of %>%) or it’s your new operator?

  2. I find R can be more elegant and readable than python, especially in cases like this example and when using the %>% operator. To make the code “flow” more naturally, at least for me, I have taken to using the right side assignment operator, ->, when using the %>% operator. It creates a top to bottom, left to right flow which I find clear and easy to read.

    read_excel(‘sales_transactions.xlsx’) %>%
    group_by(order) %>%
    mutate(Percent_of_Order = `ext price`/sum(`ext price`)) -> dat

    1. Mark, I use the right assignment operator too, though I’m not quite sold on it fully yet. Thanks for the comment.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s