regex - Remove HTML Tags in C -
in program, have downloaded webpage wget , want extract on it, text string.
what should (if right) clear html tag file have text on webpage?
i've never used regex in c , don't know if right way trouble. can advise me other alternatives, or librarys, can use? or if should use regex can me doing replace tag in c?
sed -e 's/<[^>]\+>/ /g' file.html
thanks
regular expressions aren't suited parsing html. long have xhtml, that's guaranteed valid xml, can use xml parser library parsing it.
Comments
Post a Comment