bioinformatics - Splicing through a line of a textfile using python -
i trying create genetic signatures. have textfile full of dna sequences. want read in each line text file. add 4mers 4 bases dictionary. example: sample sequence
atgatatatctatcat
what want add atga, tgat, gata, etc.. dictionary id's increment 1 while adding 4mers.
so dictionary hold...
genetic signatures, id atga,1 tgat, 2 gata,3
here have far...
import sys def main (): readingfile = open("signatures.txt", "r") my_dna="" dnaseq = {} #creates dictionary char in readingfile: my_dna = my_dna+char char in my_dna: index = 0 dnaid=1 seq = my_dna[index:index+4] if (dnaseq.has_key(seq)): #checks if key in dictionary index= index +1 else : dnaseq[seq] = dnaid index = index+1 dnaid= dnaid+1 readingfile.close() if __name__ == '__main__': main()
here output:
actc actc actc actc actc actc
this output suggests not iterating through each character in string... please help!
you need move index
, dnaid
declarations before loop, otherwise reset every loop iteration:
index = 0 dnaid=1 char in my_dna: #... rest of loop here
once make change have output:
atga 1 tgat 2 gata 3 atat 4 tata 5 atat 6 tatc 6 atct 7 tcta 8 ctat 9 tatc 10 atca 10 tcat 11 cat 12 @ 13 t 14
in order avoid last 3 items not correct length can modify loop:
for in range(len(my_dna)-3): #... rest of loop here
this doesn't loop through last 3 characters, making output:
atga 1 tgat 2 gata 3 atat 4 tata 5 atat 6 tatc 6 atct 7 tcta 8 ctat 9 tatc 10 atca 10 tcat 11
Comments
Post a Comment