Calculation with Nested loops in R -


i have geocode dataset 3 columns: latitude, longitude , cluster. calculated average center of clusters , store results in 2 lists center_lat , center_lon.

now wanna calculate distance each observation(3000+) each cluster center(30) haversine formula. 3000 30 matrix.

i tried use nested loop, got same distance clusters. here's code.

for (i in 1:n){     (k in 1:c){     lat1=radians(geocode[i,1])     lon1=radians(geocode[i,2])     lat2=radians(center_lat[k,2])     lon2=radians(center_lon[k,2])     }     r <- 3958.756 # earth mean radius [miles]     dist_mat[i,] <- acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(lon2-lon1)) * r   } 

i'm thinking using lapply substitute nested loop. i'm not sure how use function... appreciated.

# convert radian radians = function(theta=0){return(theta * pi / 180)}  # calculates geodesic distance each property center of it's current cluster using # spherical law of cosines (slc) get_dist <- function(lat1, lon1, lat2, lon2) {   r <- 3958.756 # earth mean radius [miles]   d <- acos(sin(radians(lat1))*sin(radians(lat2)) +                cos(radians(lat1))*cos(radians(lat2)) * cos(radians(lon2)-radians(lon1))) * r   return(d) # distance in miles }  dist_mat<-lapply() 

this type of computation want vectorize in r. here use outer generate possible combinations of row indices geocode data , center_x data, , apply distance function in 1 fell swoop.

first, data in easier use form (one matrix locations, centers, first column lats, second lons):

# see data section below actual data used  # g <- radians(geocode) # c <- radians(cbind(center_lat[, 2], center_lon[, 2]))     r <- 3958.756 # earth mean radius [miles] 

define function, notice how use indices actual coordinates in g , c, , how function vectorized (i.e. need call once data):

my_dist <- function(xind, yind)   acos(     sin(g[xind, 1]) * sin(c[yind, 1]) +      cos(g[xind, 1]) * cos(c[yind, 1]) * cos(c[yind, 2] - g[xind, 2])   ) * r 

and apply outer:

dists <- outer(seq.int(nrow(g)), seq.int(nrow(c)), my_dist)  str(dists) # num [1:3000, 1:30] 4208 6500 8623 7303 3864 ...  quantile(dists) # make sure stuff reasonable: #     0%       25%       50%       75%      100%  #  0.000  4107.574  6204.799  8333.155 12422.059      

this runs in 30ms on system.


data:

set.seed(1) lats <- runif(10000, -60, 60) * pi / 180 lons <- runif(10000, -179, 180) * pi / 180  g.ind <- sample(10000, 3000) c.ind <- sample(10000, 30)  g <- cbind(lats[g.ind], lons[g.ind]) c <- cbind(lats[c.ind], lons[c.ind]) 

Comments

Popular posts from this blog

shopping cart - Page redirect not working PHP -

php - How to modify a menu to show sub-menus -

python - Installing PyDev in eclipse is failed -