python - Using text(), is there a way to convert empty text to 'None' with scrapy -
i'm running problem. website xml i'm scraping has values empty, need preserve order of values.
sample:
<thedata> <some-item> <value xsi:nil="true"/> <value xsi:nil="true"/> <value xsi:nil="true"/> <value xsi:nil="true"/> <value xsi:nil="true"/> <value>44</value> <value>32</value> <value>31</value> <value xsi:nil="true"/> <value xsi:nil="true"/> <value>32</value> <value>31</value> <value>34</value> <value>34</value> <value>33</value> </some-item> </thedata> doing text() ignore empty values:
class myspider(xmlfeedspider): name = 'myspider' start_urls = ['http://www.example.com/somexml.xml'] itertag = 'thedata' # using xmlfeedspider def parse_node(self, response, node): item_vals = node.select('some-item/value/text()').extract() print item_vals this print list contains values have integer.
since need preserve order, there way tell scrapy replace empty values '' or none?
edit: @unutbu: i'm still getting same problem:
item_vals = node.select('some-item/value/text()').extract() print item_vals item_vals2 = node.select('some-item/value/text()').extract() or none print item_vals2 output:
[u'44',u'32',u'31',u'32',u'31',u'34',u'34',u'33'] [u'44',u'32',u'31',u'32',u'31',u'34',u'34',u'33'] what want is:
[none,none,none,none,none,u'44',u'32',u'31',none,none,u'32',u'31',u'34',u'34',u'33'] or represents empty value when encountered.
you need select all value nodes, , extract text (if any) each piece:
[txt item in hxs.select('some-item/value') txt in item.select('text()').extract() or [u'']]
Comments
Post a Comment