regex - Remove HTML Tags in C -


in program, have downloaded webpage wget , want extract on it, text string.

what should (if right) clear html tag file have text on webpage?

i've never used regex in c , don't know if right way trouble. can advise me other alternatives, or librarys, can use? or if should use regex can me doing replace tag in c?

sed -e 's/<[^>]\+>/ /g' file.html 

thanks

regular expressions aren't suited parsing html. long have xhtml, that's guaranteed valid xml, can use xml parser library parsing it.


Comments

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

wpf - PdfWriter.GetInstance throws System.NullReferenceException -