r - Selecting specific rows from a large dataset using column values -
i have large data set (about 2000 rows , 38 columns) looks (there missing data in columns):
species crab cmass gill gmass treatment months avglw avgils 222 cm 65 34.273 p 0.198 newtons cove 0 68.108 93.181 223 cm 57 33.506 p 0.166 newtons cove 0 37.908 39.683 225 cm 65 34.273 p 0.198 newtons cove 0 68.108 93.181 231 cm 62 30.852 p 0.147 newtons cove 0 37.285 89.823 239 cm 65 34.273 p 0.198 newtons cove 0 68.108 93.181 240 cm 57 33.506 p 0.166 newtons cove 0 37.908 39.683 241 cm 62 30.852 p 0.147 newtons cove 0 37.285 89.823 242 cm 63 22.456 p 0.093 newtons cove 0 70.005 67.687 243 cm 59 22.422 p 0.113 newtons cove 0 21.834 39.481
there multiple rows each crab number , able either average rows each crab number or select first unique row each crab number , exclude subsequent rows.
for example: average rows 222, 225, 239 crab '65'; or: select row 222 , exclude 225 , 239 crab has been selected.
i have tried using unique() , sqldf() neither have worked me.
any advice appreciated. thanks!
for average, might want try putting data in data.table
, applying function:
mydata <- data.table(mydata) mydata[, lapply(.sd, mean), .sdcols = c("cmass", "gmass"), = "crab"]
assuming want obtain average cmass
, gmass
.
for other part of question, i'm not sure. try setting key on column interested in call unique
:
setkey(mydata, crab) unique(mydata)
it sort crab
, unique
remove rows duplicate values of crab
. want?
Comments
Post a Comment