r - Unexpected behavior in dplyr::group_by_ and dplyr::summarise_ -


i wrote little function find r-squared value of regression performed on 2 variables in mtcars data set, included in r default:

get_r_squared = function(x) summary(lm(mpg ~ hp, data = x))$r.squared 

it seems work expected when give full data set:

get_r_squared(mtcars) # [1] 0.6024373 

however, if try use part of dplyr pipeline on subset of data, returns same answer above 3 times when expected return different value each subset.

library(dplyr)  mtcars %>%    group_by_("cyl") %>%    summarise_(r_squared = get_r_squared(.))  ## source: local data frame [3 x 2] ##  ##   cyl r_squared ## 1   4 0.6024373 ## 2   6 0.6024373 ## 3   8 0.6024373 

i expecting values instead

sapply(   unique(mtcars$cyl),   function(cyl){     get_r_squared(mtcars[mtcars$cyl == cyl, ])   } ) # [1] 0.01614624 0.27405583 0.08044919 

i've confirmed not plyr namespace issue: package not loaded.

search()   ##  [1] ".globalenv"        "package:knitr"     "package:dplyr"     ##  [4] "tools:rstudio"     "package:stats"     "package:graphics"  ##  [7] "package:grdevices" "package:utils"     "package:datasets"  ## [10] "package:methods"   "autoloads"         "package:base" 

i'm not sure what's going on here. related nonstandard evaluation in lm function? or misunderstanding how group_by works? or perhaps else?

i think you've misunderstood how summarise() works - doesn't ., , fact works @ happy chance. instead, try this:

library(dplyr) get_r_squared <- function(x, y) summary(lm(x ~ y))$r.squared mtcars %>%    group_by(cyl) %>%    summarise(r_squared = get_r_squared(mpg, wt)) 

Comments

Popular posts from this blog

jquery - How do you format the date used in the popover widget title of FullCalendar? -

Bubble Sort Manually a Linked List in Java -

asp.net mvc - SSO between MVCForum and Umbraco7 -