logic - Unexpected behavior in subsetting aggregate function in R -
i have data frame contains following format:
manufacturers pricegroup leads harley <2500 # honda <5000 # ... ... ..
i using aggregate function pull out data in following way:
aggregate( leads ~ manufacturer + pricegroup, data=leaddata, fun=sum, subset=(manufacturer==c("honda","harley")))
i noticed not returning correct totals. numbers each manufacturer smaller , smaller more manufacturers add subset group. however, if use:
aggregate( leads ~ manufacturer + pricegroup, data=leaddata, fun=sum, subset=(manufacturer=="honda" | manufacturer=="harley"))
it returns correct numbers. life of me, can't figure out why. use or operator, except passing list of manufacturers in dynamically. thoughts why first construct not working? better, thoughts on how make work? thanks!
the problem ==
alternating between values of "honda" , "harley" , comparing value in relevant position of "manufacturer" variable. on other hand, %in%
(as suggested mrflick) , |
checking across entire "manufacturer" variable before deciding values mark.
==
recycle values length of being compared.
this might easier see example:
set.seed(1) v1 <- sample(letters[1:5], 10, true) v2 <- c("a", "b") ## recycled rep(c("a", "b"), 5) when comparing v1 data.frame(v1, v2, `==` = v1 == v2, `%in%` = v1 %in% v2, `|` = v1 == "a" | v1 == "b", check.names = false) # v1 v2 == %in% | # 1 b false true true # 2 b b true true true # 3 c false false false # 4 e b false false false # 5 b false true true # 6 e b false false false # 7 e false false false # 8 d b false false false # 9 d false false false # 10 b false true true
notice in ==
column, true
value "v1" , recycled values of "v2" same.
Comments
Post a Comment