regex - How to extract fasta sequences in a file which header line matches with list in another file? -


i'm newbie perl. trying extract fasta sequences 1 file matches lines in file. 2 example files follows:

file1.fasta:

>gene_44|105_nt|+|47540|47644 gtgcgccggcgcgtcgcgatcgcgaaccggcccgtgcgaatcctgccgcatgcgcgccgcatctcgccacgccgcgcatttcatttcgacatccataacgtctga

>gene_69|111_nt|+|75846|75956 atgccgttgccgtcgcgcatcgcggcggccgtgcgcggcgcgcatgcatacgccggcacggccgatgcgcgcgcgacgcgcaaactgcacgcggcgcgggatttgtgttga

>gene_88|177_nt|-|97993|98169
atgcgccagccgacgcacgcccattccgggcgaaacgttccccttatccattcgatcatccgtgccgcactgcgcgaagcggccaccgccgacacgtaccaaaccgcgctcgatgcgaccggcgcggcactcgtcgccatcgcggcgctcgtgcgcgcggaggtgcggcatggctga

>gene_90|141_nt|-|99016|99156
ttggaagggcgctttccgcgtgcgagtcgtctgacgcagcgttgcacggtctggtcgaatcgcgagcttcatcgctggatggccgatccgttgaactatcgcgctgtcgacgcggcgaaccagacgacggagggcgcgtaa

file2.list:

somewordsinfront, >gene_44|somewordsattheback

blablabla, >gene_88|blablablablabla

the output expect follows:

>gene_44|105_nt|+|47540|47644 gtgcgccggcgcgtcgcgatcgcgaaccggcccgtgcgaatcctgccgcatgcgcgccgcatctcgccacgccgcgcatttcatttcgacatccataacgtctga

>gene_88|177_nt|-|97993|98169
atgcgccagccgacgcacgcccattccgggcgaaacgttccccttatccattcgatcatccgtgccgcactgcgcgaagcggccaccgccgacacgtaccaaaccgcgctcgatgcgaccggcgcggcactcgtcgccatcgcggcgctcgtgcgcgcggaggtgcggcatggctga

how can achieve that? in advance! :)

next time when ask question, please show code, example

use strict; use warnings;  @genes;  open $list, '<file2.list'; while (my $line = <$list>) {     push (@genes, $1) if $line =~ /[^>]+>([^|]+)/;  } $input; close $list; {     local $/ = undef;     open $fasta, '<file1.fasta';     $input = <$fasta>;     close $fasta; } @lines = split(/>/,$input); foreach $l (@lines) {     foreach $reg (@genes) {         print ">$l" if $l =~ /$reg/     } } 

Comments

Popular posts from this blog

ios - iPhone/iPad different view orientations in different views , and apple approval process -

java Extracting Zip file -

C# WinForm - loading screen -