Chris Moffit has a nice blog on how to use the `transform`

function in `pandas`

. He provides some (fake) data on sales and asks the question of what fraction of each order is from each SKU.

Being a R nut and a `tidyverse`

fan, I thought to compare and contrast the code for the `pandas`

version with an implementation using the tidyverse.

First the `pandas`

code:

import pandas as pd
dat = pd.read_excel('sales_transactions.xlsx')
dat['Percent_of_Order'] = dat['ext price']/dat.groupby('order')['ext price'].transform('sum')

A similar implementation using the tidyverse:

library(tidyverse)
library(readxl)
dat <- read_excel('sales_transactions.xlsx')
dat <- dat %>%
group_by(order) %>%
mutate(Percent_of_Order = `ext price`/sum(`ext price`))

### Like this:

Like Loading...

*Related*

Hey, nice comparison. I have question about %>% operator. It is mistake (instead of %>%) or it’s your new operator?

Was a problem of writing offline and uploading. WordPress translated > to & gt; Thanks for pointing it out.

I find R can be more elegant and readable than python, especially in cases like this example and when using the %>% operator. To make the code “flow” more naturally, at least for me, I have taken to using the right side assignment operator, ->, when using the %>% operator. It creates a top to bottom, left to right flow which I find clear and easy to read.

read_excel(‘sales_transactions.xlsx’) %>%

group_by(order) %>%

mutate(Percent_of_Order = `ext price`/sum(`ext price`)) -> dat

Mark, I use the right assignment operator too, though I’m not quite sold on it fully yet. Thanks for the comment.