Odd character appending to the front of python list -
i'm having issue regards running python on linux. i'm trying learn python , wanted try , parse small xml file , put tags , data list. every time run code 'u' appending each element in list.
[u'world'] defaultdict(<type 'list'>, {u'world': [u'data']}) my code follows:
import xml.sax collections import defaultdict class transformxml(xml.sax.contenthandler): def __init__ (self): self.start_tag_name = -1 self.tag_data = -1 self.mydict = defaultdict(list) self.tags = [] def startelement(self, name, attrs): self.start_tag_name = name print name print self.start_tag_name def characters(self, content): if content.strip(' \r\n\t') != "": self.tag_data = content.strip(' \r\n\t') print self.start_tag_name self.tags.append(self.start_tag_name) self.mydict[self.start_tag_name].append(content.strip(' \r\n\t')) def endelement(self, name): pass def __del__ (self): if self.mydict: del self.mydict print "deleteing mydict" does know issue might be?
that 'weird' symbol means string or character encoded in unicode
eg. if have string test:
>>> unicode('test') u'test' >>> s = unicode('test') >>> type(s) <type 'unicode'> documentation here
to sum up, according python docs,
...a unicode string sequence of code points, numbers 0 0x10ffff. sequence needs represented set of bytes (meaning, values 0-255) in memory. rules translating unicode string sequence of bytes called encoding.
Comments
Post a Comment