How to split /t without create a new line using Python -
i have lists of text in 1 folder :
my o name o o alex b . o o o o london b . o
this codes:
import re def read_file(filename): file = open(filename).read().strip().split("\n\n") lines = [] line in file: lines.append(re.split(r'\t|\n', line)) return lines train_sents = read_file(("train.txt")) train_sents [0]
the output is:
[ 'my', 'o', 'name', 'o', "is', 'o', 'alex', 'b', '.', 'o']
my question is..is possible split \t without splitting new line? example output like:
[('my', 'o'), ('name', 'o'), ("is', 'o'), ('alex', 'b'), ('.', 'o')]
just split each line:
with open(filename) f: print([tuple(line.split()) line in f]) [('my', 'o'), ('name', 'o'), ('is', 'o'), ('alex', 'b'), ('.', 'o')]
to separate lines empty lines append last sublist or else add new list if meet empty line:
with open(infile) f: l = [[]] line in f: if line.strip(): l[-1].append(tuple(line.split())) else: l.append([]) print(l[0]) print(l[1]) [('my', 'o'), ('name', 'o'), ('is', 'o'), ('alex', 'b'), ('.', 'o')] [('i', 'o'), ('am', 'o'), ('from', 'o'), ('london', 'b'), ('.', 'o')]
you use itertools.groupby grouping using empty lines delimiter:
from itertools import groupby open(infile) f: print([list(map(str.split, v)) k, v in groupby(f, key=lambda x: x.strip() != "") if k]) [[['my', 'o'], ['name', 'o'], ['is', 'o'], ['alex', 'b'], ['.', 'o']], [['i', 'o'], ['am', 'o'], ['from', 'o'], ['london', 'b'], ['.', 'o']]]
you can map tuple if necessary.
Comments
Post a Comment