read.table - Dealing with scan error in R - Caused by misplaced \n separators in data -

July 15, 2015

i'm reading in many large tab-separated .txt files using read.table in r. however, lines contain newline breaks (\n) there should tabs (\t), causes error in scan(...). how can deal issue robustly? (is there way replace \n-->\t every time scan encounters error?)

edit:

here's simple example:

read.table(text='a1\tb1\tc1\td1\n                  a2\tb2\tc2\td2', sep='\t')

works fine, , returns data frame. however, suppose there is, mistake, newline \n there should tab \t (e.g., after c1):

read.table(text='a1\tb1\tc1\nd1\n                  a2\tb2\tc2\td2', sep='\t')

this raises error:

error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :      line 1 did not have 4 elements

note: using fill=t won't help, because push d1 new row.

if have exact problem describe (no missing data, wrong seperator) try:

library(readr) initial_lines <- read_lines('a1\tb1\tc1\nd1\na2\tb2\tc2\td2')  seperated_together <- unlist(strsplit(initial_lines, "\t", fixed = t))  matrix(seperated_together, ncol = 4)

gives:

     [,1] [,2] [,3] [,4] [1,] "a1" "c1" "a2" "c2" [2,] "b1" "d1" "b2" "d2"

and transform wish wish.

if have missing data/complications you'll have to:

strsplit(initial_lines,'\t',fixed=t)

which gives:

[[1]] [1] "a1" "b1" "c1"  [[2]] [1] "d1"  [[3]] [1] "a2" "b2" "c2" "d2"

and you'll have parse through elements combining based on number of elements.

you have @ ?count_fields in readr.

Search This Blog

UV code