python - Multiple regex replacements based on lists in multiple files -
i have folder multiple text files inside need process , format using multiple replacement lists looking this:
old string1~new string1 old string2~new string2 etc~blah
i run each replacement pair replacement lists on each line of multiple text files. have set of python scripts perform operation. wonder make code simpler , better maintainable if switch sed or awk? better solution or should better improve python code? ask because incoming text files come on regular basis , have little different structure before, mistakes, misspellings, multiple spaces, these files being created humans. have tweak code , replacement lists make work properly. thanks.
unless python code bad, not switching awk make more maintainable. said, it's pretty simple in awk, not scale well:
cat replacement-list-files* | awk 'filename == "-" { split( $0, a, "~" ); repl[ a[1] ] = a[2]; next } { for( in repl ) gsub( i, repl[i] ) }1' - input-file
note works on 1 file @ time. replace 1
{ print > ( filename ".new" ) }
work on multiple files, have deal closing files if want work on large number of files, , becomes unmaintainable mess. stick python if have working solution.
Comments
Post a Comment