python 3.x - Python3 re.split() via character which is not in special substring -
i'm trying parse , validate language. want tokenize input check grammar. input string is:
something > 0 , (something contains "substr" or not something)
if did this:
tokens = re.split(r"([\s()])", input)
i got this:
['something', ' ', '>', ' ', '0', ' ', 'and', ' ', '', '(', 'something', ' ', 'contains',' ', '"substr"', ' ', 'or', ' ', 'not', ' ', 'something', ')', '']
which exacly want. but, there allways "something". if replace "substr" "substr whitespace", got array, not perfect result:
['"substr', ' ', 'with', ' ', 'whitespace"']
is there way how split following?
['"substr whitespace"']
or how efficiently repair "so close split" ? or maybe different missed...
just think split with
re.split(r"\s*(not|and|or|\(|\)|contains|<|>|=)\s*", input)
solved problem
Comments
Post a Comment