python 3.x - Python3 re.split() via character which is not in special substring -


i'm trying parse , validate language. want tokenize input check grammar. input string is:

something > 0 , (something contains "substr" or not something) 

if did this:

tokens = re.split(r"([\s()])", input) 

i got this:

['something', ' ', '>', ' ', '0', ' ', 'and', ' ', '', '(', 'something', ' ', 'contains','   ', '"substr"', ' ', 'or', ' ', 'not', ' ', 'something', ')', ''] 

which exacly want. but, there allways "something". if replace "substr" "substr whitespace", got array, not perfect result:

['"substr', ' ', 'with', ' ', 'whitespace"'] 

is there way how split following?

['"substr whitespace"'] 

or how efficiently repair "so close split" ? or maybe different missed...

just think split with

re.split(r"\s*(not|and|or|\(|\)|contains|<|>|=)\s*", input) 

solved problem


Comments

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

wpf - PdfWriter.GetInstance throws System.NullReferenceException -