regex - u character appears within a regular expression in python -
i have lines of code extracts email addresses pdf file.
page in pdf.pages: pdf = page.extracttext() # print elpdf r = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-za-z]{1,4}') results = r.findall(pdf) listemail.append(results) print(listemail[0:]) pdf.stream.close()
unfortunately, after running code have noticed results not fine appears 'u' character every time match found:
[[u'testuser1@training.local']] [[u'testuser2@training.local']]
does know haow avoid character appearing?
thanks in advance
as others have noted, not bug, feature.
if want non-unicode encoded strings, can convert text unicode more palatable. stackoverflow q/a cover subject:
convert unicode string string in python (containing symbols)
i've run before , in use cases, can problematic, encounter issues method expects non-unicode string , breaks. :)
example solutions link:
>>> a=u'aaa' >>> u'aaa' >>> a.encode('ascii','ignore') 'aaa' >>> a.encode('utf8','ignore') 'aaa' >>> str(a) 'aaa' >>>
Comments
Post a Comment