read.table - Dealing with scan error in R - Caused by misplaced \n separators in data -
i'm reading in many large tab-separated .txt
files using read.table
in r. however, lines contain newline breaks (\n
) there should tabs (\t
), causes error in scan(...)
. how can deal issue robustly? (is there way replace \n
-->\t
every time scan
encounters error?)
edit:
here's simple example:
read.table(text='a1\tb1\tc1\td1\n a2\tb2\tc2\td2', sep='\t')
works fine, , returns data frame. however, suppose there is, mistake, newline \n
there should tab \t
(e.g., after c1
):
read.table(text='a1\tb1\tc1\nd1\n a2\tb2\tc2\td2', sep='\t')
this raises error:
error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 4 elements
note: using fill=t
won't help, because push d1
new row.
if have exact problem describe (no missing data, wrong seperator) try:
library(readr) initial_lines <- read_lines('a1\tb1\tc1\nd1\na2\tb2\tc2\td2') seperated_together <- unlist(strsplit(initial_lines, "\t", fixed = t)) matrix(seperated_together, ncol = 4)
gives:
[,1] [,2] [,3] [,4] [1,] "a1" "c1" "a2" "c2" [2,] "b1" "d1" "b2" "d2"
and transform wish wish.
if have missing data/complications you'll have to:
strsplit(initial_lines,'\t',fixed=t)
which gives:
[[1]] [1] "a1" "b1" "c1" [[2]] [1] "d1" [[3]] [1] "a2" "b2" "c2" "d2"
and you'll have parse through elements combining based on number of elements.
you have @ ?count_fields
in readr
.
Comments
Post a Comment