r - Rstan on Rstudio MCMC having too elevated running time (limited use of avaiable CPU and RAM) -
i newbie of rstan world, need thesis. using script , similar dataset guy nyu, reports estimated time similar ds of 18 hours. however, when try run model won't more 10% in 18hours. thus, ask little understand doing wrong , how improve efficiency.
i running 500 iter, 100 warmup 2 chains model bernoulli_logit function on 5 parameters, trying estimate 2 of them through no u turn mc procedure. (at each step draws random normal each parameters, estimates y , compares actual data see if new parameters better fit data)
y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] - gamma * square( theta[jj[n]] - phi[kk[n]] ) );
(n being 10mln) data 10.000x1004 matrix of 0s , 1s. wrap up, matrix people following politicians on twitter , want estimate political ideas given follow. run model on rstudio r x64 3.1.1 on win8 professional, 6bit, i7 quad core 16 gb ram. checking performances, rsession uses no more 14% cpu , 6gb of ram, although 7 more gb free. while trying subsample 10.000x250 matrix, have noticed use below 1.5gb instead. however, have tried procedure 50x50 dataset , worked fine, there no mistake in procedure. rsession opens 8 threads, see activity on each core none occupied. wonder why case pc not work @ best of possibilities , whether there might bottleneck, cap or setup prevents so. r 64 bit (just checked) , rstan should (even though had difficulties in installing , might have messed parameters)
this happens when compile it
iteration: 1 / 1 [100%] (sampling) # elapsed time: 0 seconds (warm-up) # 11.451 seconds (sampling) # 11.451 seconds (total) sampling model 'stan.code' (chain 2). iteration: 1 / 1 [100%] (sampling) # elapsed time: 0 seconds (warm-up) # 12.354 seconds (sampling) # 12.354 seconds (total)
while when run it works hours never goes beyond 10% of first chain (mainly because have interrupted after pc melt down).
iteration: 1 / 500 [ 0%] (warmup)
and has setting:
stan.model <- stan(model_code=stan.code, data = stan.data, init=inits, iter=1, warmup=0, chains=2) ## running modle stan.fit <- stan(fit=stan.model, data = stan.data, iter=500, warmup=100, chains=2, thin=thin, init=inits)
please me find slowing down procedure (and if nothing wtong happening, can manipulate have still reasonable result in shorter time?).
i thank in advance,
ml
here's model (from pablo barbera, nyu)
n.iter <- 500 n.warmup <- 100 thin <- 2 ## give 200 effective samples each chain , par adjmatrix <- read.csv("d:/thematrix/adjmatrix_1004by10000_20150424.txt", header=false) ##10.000x1004 matrix of {0, 1} relationship "user follows politician j" startphi <- read.csv("d:/thematrix/startphi_20150424.txt", header=false) ##1004 vector of values [-1, 1] should prior phi want estimate start.phi<-ba<-c(do.call("cbind",startphi)) y<-adjmatrix j <- dim(y)[1] k <- dim(y)[2] n <- j * k jj <- rep(1:j, times=k) kk <- rep(1:k, each=j) stan.data <- list(j=j, k=k, n=n, jj=jj, kk=kk, y=c(as.matrix(y))) ## rest of starting values colk <- colsums(y) rowj <- rowsums(y) normalize <- function(x){ (x-mean(x))/sd(x) } inits <- rep(list(list(alpha=normalize(log(colk+0.0001)), beta=normalize(log(rowj+0.0001)), theta=rnorm(j), phi=start.phi,mu_beta=0, sigma_beta=1, gamma=abs(rnorm(1)), mu_phi=0, sigma_phi=1, sigma_alpha=1)),2) ##alpha , beta popularity of politician j , propensity follow people of user i; ##phi , theta position on political spectrum of pol j , user i; phi has prior given expert surveys ##gamma weight on importance of political closeness library(rstan) stan.code <- ' data { int<lower=1> j; // number of twitter users int<lower=1> k; // number of elite twitter accounts int<lower=1> n; // n = j x k int<lower=1,upper=j> jj[n]; // twitter user observation n int<lower=1,upper=k> kk[n]; // elite account observation n int<lower=0,upper=1> y[n]; // dummy if user follows elite j } parameters { vector[k] alpha; vector[k] phi; vector[j] theta; vector[j] beta; real mu_beta; real<lower=0.1> sigma_beta; real mu_phi; real<lower=0.1> sigma_phi; real<lower=0.1> sigma_alpha; real gamma; } model { alpha ~ normal(0, sigma_alpha); beta ~ normal(mu_beta, sigma_beta); phi ~ normal(mu_phi, sigma_phi); theta ~ normal(0, 1); (n in 1:n) y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] - gamma * square( theta[jj[n]] - phi[kk[n]] ) ); } ' ## compiling model stan.model <- stan(model_code=stan.code, data = stan.data, init=inits, iter=1, warmup=0, chains=2) ## running modle stan.fit <- stan(fit=stan.model, data = stan.data, iter=n.iter, warmup=n.warmup, chains=2, thin=thin, init=inits) samples <- extract(stan.fit, pars=c("alpha", "phi", "gamma", "mu_beta", "sigma_beta", "sigma_alpha"))
first, apologies: have introduced comment, don't have enough reputation.
here's question asked: "what can manipulate have still reasonable result in shorter time?"
the answer is, depends. instead of representing things binary matrix, have tried reducing size of matrix using counts? based on type of model you're trying run, imagine there non-identifiablity in posterior. try reparameterizing?
also, may want run in cmdstan if r causing problems memory management.
Comments
Post a Comment