python - How can I print only the lines with five or more matches of a regular expression? -
i'm trying use regular expressions in python parse large tab delimited text file line line, , print lines line contains 5 or more instances of 0/1 or 1/1.
my script there, struggling 5 or more instances.
this print lines 1 match.
import re f = open ("infile.txt", "r") out = open("outfile.txt", "w") line in f: if re.match(r"(.*)(0|1)/(1)(.*)", line): print >> out, line, to print lines have 5 or more matches tried findall , finditer follows didn't work:
for line in f: x = len(re.findall(r"(.*)(0|1)/(1)(.*)", line)): if x > 5: print >> out, line, can me this?
here example of 1 line text file (all spaces tabs in file):
x 6529 . c a,g pass ac=4,2;af=0.6777 1/1:0,20 0/1:0,16 0/1:0,16 0/0:4,16 0/0:3,1
you can use {5,} match pattern 5 or more times
import re f = open ("data.txt", "r") out = open("dataout.txt", "w") line in f: if re.match(r"(.*([01]/1.*){5,}", line): print >> out, line,
Comments
Post a Comment