python - regex to match words of length specified within string -
i trying parse text output samtools mpileup. start string
s = '.$......+2ag.+2ag.+2aggg' whenever have + followed integer n, select n characters following integer , replace whole thing *. test case have
'.$......+2ag.+2ag.+2aggg' ---> '.$......*.*.*gg' i have regex \+[0-9]+[acgtnacgtn]+ results in output .$......*.*.* , trailing g's lost well. how select n characters n not known ahead of time specified in string itself?
the repl argument in re.sub can string or function.
so, can complex things function replacements:
def removechars(m): x=m.group() n=re.match(r'\+(\d+).*', x).group(1) # digit part return '*'+x[1+len(n)+int(n):] solves problem:
>>> re.sub(r'\+[0-9]+[acgtnacgtn]+', removechars, s) '.$......*.*.*gg'
Comments
Post a Comment