Calculation with Nested loops in R -
i have geocode dataset 3 columns: latitude, longitude , cluster. calculated average center of clusters , store results in 2 lists center_lat , center_lon.
now wanna calculate distance each observation(3000+) each cluster center(30) haversine formula. 3000 30 matrix.
i tried use nested loop, got same distance clusters. here's code.
for (i in 1:n){ (k in 1:c){ lat1=radians(geocode[i,1]) lon1=radians(geocode[i,2]) lat2=radians(center_lat[k,2]) lon2=radians(center_lon[k,2]) } r <- 3958.756 # earth mean radius [miles] dist_mat[i,] <- acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(lon2-lon1)) * r } i'm thinking using lapply substitute nested loop. i'm not sure how use function... appreciated.
# convert radian radians = function(theta=0){return(theta * pi / 180)} # calculates geodesic distance each property center of it's current cluster using # spherical law of cosines (slc) get_dist <- function(lat1, lon1, lat2, lon2) { r <- 3958.756 # earth mean radius [miles] d <- acos(sin(radians(lat1))*sin(radians(lat2)) + cos(radians(lat1))*cos(radians(lat2)) * cos(radians(lon2)-radians(lon1))) * r return(d) # distance in miles } dist_mat<-lapply()
this type of computation want vectorize in r. here use outer generate possible combinations of row indices geocode data , center_x data, , apply distance function in 1 fell swoop.
first, data in easier use form (one matrix locations, centers, first column lats, second lons):
# see data section below actual data used # g <- radians(geocode) # c <- radians(cbind(center_lat[, 2], center_lon[, 2])) r <- 3958.756 # earth mean radius [miles] define function, notice how use indices actual coordinates in g , c, , how function vectorized (i.e. need call once data):
my_dist <- function(xind, yind) acos( sin(g[xind, 1]) * sin(c[yind, 1]) + cos(g[xind, 1]) * cos(c[yind, 1]) * cos(c[yind, 2] - g[xind, 2]) ) * r and apply outer:
dists <- outer(seq.int(nrow(g)), seq.int(nrow(c)), my_dist) str(dists) # num [1:3000, 1:30] 4208 6500 8623 7303 3864 ... quantile(dists) # make sure stuff reasonable: # 0% 25% 50% 75% 100% # 0.000 4107.574 6204.799 8333.155 12422.059 this runs in 30ms on system.
data:
set.seed(1) lats <- runif(10000, -60, 60) * pi / 180 lons <- runif(10000, -179, 180) * pi / 180 g.ind <- sample(10000, 3000) c.ind <- sample(10000, 30) g <- cbind(lats[g.ind], lons[g.ind]) c <- cbind(lats[c.ind], lons[c.ind])
Comments
Post a Comment