Is R able to compute contingency tables on big file without putting the whole file in RAM? -


let me explain question:

i know functions table or xtabs compute contingency tables, expect data.frame, stored in ram. it's painful when trying on big file (say 20 gb, maximum have tackle).

on other hand, sas able this, because reads file line line, , updates result in process. hence there ever 1 line in ram, more acceptable.

i have done same sas ad-hoc python programs on occasion, when had more complicated stuff either didn't know how in sas or thought cumbersome. python syntax , integrated features (dictionaries, regular expressions...) compensate weaknesses (speed, mainly, when reading 20 gb, speed limitated hard drive anyway).

my question, then: know if there packages in r. know it's possible read file line line, in python, computing simple statistics (contingency tables instance) on big file such basic task feel there should more or less "integrated" feature in statistical package.

please tell me if question should asked on "cross validated". had doubt, since it's more software statistics.

you can use package ff uses hard disk drive instead of ram implemented in way doesn't make (significantly) slower normal way r uses ram.

this if package description:

the ff package provides data structures stored on disk behave (almost) if in ram transparently mapping section (pagesize) in main memory.

i think solve problem of loading 20gb file in ram. have used myself such purposes , worked great.

see here small example well. example on xtabs documentation:

base r

#example ?xtabs d.ergo <- data.frame(type = paste0("t", rep(1:4, 9*4)),                      subj = gl(9, 4, 36*4)) > print(xtabs(~ type + subj, data = d.ergo)) # 4 replicates each subj type 1 2 3 4 5 6 7 8 9   t1 4 4 4 4 4 4 4 4 4   t2 4 4 4 4 4 4 4 4 4   t3 4 4 4 4 4 4 4 4 4   t4 4 4 4 4 4 4 4 4 4 

ff package

#convert ff d.ergoff <- as.ffdf(d.ergo)  > print(xtabs(~ type + subj, data = d.ergoff)) # 4 replicates each     subj type 1 2 3 4 5 6 7 8 9   t1 4 4 4 4 4 4 4 4 4   t2 4 4 4 4 4 4 4 4 4   t3 4 4 4 4 4 4 4 4 4   t4 4 4 4 4 4 4 4 4 4 

you can check here more information on memory manipulation.


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -