python - How can I print only the lines with five or more matches of a regular expression? -


i'm trying use regular expressions in python parse large tab delimited text file line line, , print lines line contains 5 or more instances of 0/1 or 1/1.

my script there, struggling 5 or more instances.

this print lines 1 match.

import re   f = open ("infile.txt", "r")   out = open("outfile.txt", "w")    line in f:       if re.match(r"(.*)(0|1)/(1)(.*)", line):           print >> out, line, 

to print lines have 5 or more matches tried findall , finditer follows didn't work:

for line in f:       x = len(re.findall(r"(.*)(0|1)/(1)(.*)", line)):       if x > 5:           print >> out, line, 

can me this?

here example of 1 line text file (all spaces tabs in file):

x 6529 . c a,g pass ac=4,2;af=0.6777 1/1:0,20 0/1:0,16 0/1:0,16 0/0:4,16 0/0:3,1  

you can use {5,} match pattern 5 or more times

import re f = open ("data.txt", "r") out = open("dataout.txt", "w")  line in f:     if re.match(r"(.*([01]/1.*){5,}", line):         print >> out, line, 

Comments

Popular posts from this blog

ios - iPhone/iPad different view orientations in different views , and apple approval process -

java Extracting Zip file -

C# WinForm - loading screen -