dataset - Fast access to lines in huge text file -

September 15, 2015

i working relatively large text file (70gb uncompressed, 15gb gzipped) contains 3 columns. lines of file of form:

x1 | y1 | a1 x1 | y2 | a2 x2 | y3 | a3 x3 | y4 | a4

the x , y sequences of words can contain between 1 , 4 words. strings in first column sorted , not unique. strings in second column not unique, , sorted same element in first column.

there 700,000,000 lines in uncompressed text file, , want is, query of tuples (x, y), value in third column. need access in time short possible.

what tried create 2 dictionaries of (strings, list of integers), first dictionary maps string index of lines contain string in first column, , same second dictionary , second column. query (x, y) can intersect these 2 lists , line contains "x | y | a". can use dictionary maps line number offset in file , use random access file read line.

the problem requires way memory (maybe it's because i'm using java!). looking solution can query text file quickly, doesn't require more 20 / 30 gb of ram.

i guess there methods kind of things i'm not familiar them. ideas?

thanks

Search This Blog

UV code

dataset - Fast access to lines in huge text file -

Comments

Post a Comment

Popular posts from this blog

shopping cart - Page redirect not working PHP -

php - How to modify a menu to show sub-menus -

python - Installing PyDev in eclipse is failed -