python - Regular expression extract and exclude data from string -

i have html string want extract data out of.

s="<ul><li>this bullet lev 1&nbsp;</li><li><ul><li><strong>&nbsp;this</strong> bullet lev&nbsp;</li></ul></li><li>&nbsp;<ul><li><ul><li>this bullet lev 3</li></ul></li></ul></li></ul></ul><strong></li>

i want extract content of data containing <li> elements, these elements contain "this bullet lev 1 " between them , not contains other <li> in multilevel elements such

<li><ul><li><strong>&nbsp;this</strong> bullet lev&nbsp;</li></ul></li>

i have written regular expression that

<li>([\w &;/<>]*?)</li>

however ends pulling unwanted data well

<li>this bullet lev 1&nbsp;</li> <li><ul><li><strong>&nbsp;this</strong> bullet lev&nbsp;</li> <li>&nbsp;<ul><li><ul><li>this bullet lev 3</li>

while want pull

<li>this bullet lev 1&nbsp;</li> <li><strong>&nbsp;this</strong> bullet lev&nbsp;</li> <li>&nbsp;<ul><li><ul><li>this bullet lev 3</li>

the idea want exclude results have <li> in extracted data , move ahead.

from research understood have use lookahead or lookbehind , gave couple of tries no avail.

any clues? using python , builtin re module.

i think might job.

<li>((?!<li>).)*?</li>

should match <li> followed </li> , in between long don't contain <li> (using lookahead)

this assumes don't want <li> <ul><li><ul><li>this bullet lev 3</li>, rather: <li>this bullet lev 3</li>, in examples, seems more consistent description.

that said, parser better idea sort of thing, speaking.

Search This Blog

Bready

python - Regular expression extract and exclude data from string -

Comments

Post a Comment

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

c# - Using multiple datasets in RDLC -