read.table - Dealing with scan error in R - Caused by misplaced \n separators in data -


i'm reading in many large tab-separated .txt files using read.table in r. however, lines contain newline breaks (\n) there should tabs (\t), causes error in scan(...). how can deal issue robustly? (is there way replace \n-->\t every time scan encounters error?)

edit:

here's simple example:

read.table(text='a1\tb1\tc1\td1\n                  a2\tb2\tc2\td2', sep='\t') 

works fine, , returns data frame. however, suppose there is, mistake, newline \n there should tab \t (e.g., after c1):

read.table(text='a1\tb1\tc1\nd1\n                  a2\tb2\tc2\td2', sep='\t') 

this raises error:

error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :      line 1 did not have 4 elements 

note: using fill=t won't help, because push d1 new row.

if have exact problem describe (no missing data, wrong seperator) try:

library(readr) initial_lines <- read_lines('a1\tb1\tc1\nd1\na2\tb2\tc2\td2')  seperated_together <- unlist(strsplit(initial_lines, "\t", fixed = t))  matrix(seperated_together, ncol = 4) 

gives:

     [,1] [,2] [,3] [,4] [1,] "a1" "c1" "a2" "c2" [2,] "b1" "d1" "b2" "d2" 

and transform wish wish.

if have missing data/complications you'll have to:

strsplit(initial_lines,'\t',fixed=t) 

which gives:

[[1]] [1] "a1" "b1" "c1"  [[2]] [1] "d1"  [[3]] [1] "a2" "b2" "c2" "d2"   

and you'll have parse through elements combining based on number of elements.

you have @ ?count_fields in readr.


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -