regex - Remove HTML Tags in C -


in program, have downloaded webpage wget , want extract on it, text string.

what should (if right) clear html tag file have text on webpage?

i've never used regex in c , don't know if right way trouble. can advise me other alternatives, or librarys, can use? or if should use regex can me doing replace tag in c?

sed -e 's/<[^>]\+>/ /g' file.html 

thanks

regular expressions aren't suited parsing html. long have xhtml, that's guaranteed valid xml, can use xml parser library parsing it.


Comments

Popular posts from this blog

asp.net mvc 3 - Using mvc3, I need to add a username/password to the sql connection string at runtime -

kineticjs - draw multiple lines and delete individual line -

thumbnails - jQuery image rotate on hover -