python 3.x - Python3 re.split() via character which is not in special substring -

i'm trying parse , validate language. want tokenize input check grammar. input string is:

something > 0 , (something contains "substr" or not something)

if did this:

tokens = re.split(r"([\s()])", input)

i got this:

['something', ' ', '>', ' ', '0', ' ', 'and', ' ', '', '(', 'something', ' ', 'contains','   ', '"substr"', ' ', 'or', ' ', 'not', ' ', 'something', ')', '']

which exacly want. but, there allways "something". if replace "substr" "substr whitespace", got array, not perfect result:

['"substr', ' ', 'with', ' ', 'whitespace"']

is there way how split following?

['"substr whitespace"']

or how efficiently repair "so close split" ? or maybe different missed...

just think split with

re.split(r"\s*(not|and|or|\(|\)|contains|<|>|=)\s*", input)

solved problem

Search This Blog

Bready

python 3.x - Python3 re.split() via character which is not in special substring -

Comments

Post a Comment

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

c# - Using multiple datasets in RDLC -