regex - u character appears within a regular expression in python -


i have lines of code extracts email addresses pdf file.

 page in pdf.pages:       pdf = page.extracttext()       # print elpdf       r = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-za-z]{1,4}')       results = r.findall(pdf)       listemail.append(results)       print(listemail[0:])  pdf.stream.close() 

unfortunately, after running code have noticed results not fine appears 'u' character every time match found:

[[u'testuser1@training.local']] [[u'testuser2@training.local']] 

does know haow avoid character appearing?

thanks in advance

as others have noted, not bug, feature.

if want non-unicode encoded strings, can convert text unicode more palatable. stackoverflow q/a cover subject:

convert unicode string string in python (containing symbols)

i've run before , in use cases, can problematic, encounter issues method expects non-unicode string , breaks. :)

example solutions link:

>>> a=u'aaa' >>> u'aaa' >>> a.encode('ascii','ignore') 'aaa' >>> a.encode('utf8','ignore') 'aaa' >>> str(a) 'aaa' >>>  

Comments

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

wpf - PdfWriter.GetInstance throws System.NullReferenceException -