python - Scrapy creating XML feed wraps content in "value" tags -


i've had bit of on here code pretty works. issue in process of generating xml, wraps content in "value" tags when don't want to. according doc's due this:

unless overriden in :meth:serialize_field method, multi-valued fields exported serializing each value inside <value> element. convenience, multi-valued fields common.

this output:

<?xml version="1.0" encoding="utf-8"?> <items>    <item>       <body>          <value>don't forget me weekend!</value>       </body>       <to>          <value>tove</value>       </to>       <who>          <value>jani</value>       </who>       <heading>          <value>reminder</value>       </heading>    </item> </items> 

what send xml exporter seems this, don't know why think's it's multivalue?

{'body': [u"don't forget me weekend!"],  'heading': [u'reminder'],  'to': [u'tove'],  'who': [u'jani']} 

pipeline.py

from scrapy import signals scrapy.contrib.exporter import xmlitemexporter  class xmlexportpipeline(object):      def __init__(self):         self.files = {}      @classmethod     def from_crawler(cls, crawler):          pipeline = cls()          crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)          crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)          return pipeline      def spider_opened(self, spider):         file = open('%s_products.xml' % spider.name, 'w+b')         self.files[spider] = file         self.exporter = xmlitemexporter(file)         self.exporter.start_exporting()      def spider_closed(self, spider):         self.exporter.finish_exporting()         file = self.files.pop(spider)         file.close()      def process_item(self, item, spider):         self.exporter.export_item(item)         return item 

spider.py

from scrapy.contrib.spiders import xmlfeedspider crawler.items import crawleritem  class sitespider(xmlfeedspider):     name = 'site'     allowed_domains = ['www.w3schools.com']     start_urls = ['http://www.w3schools.com/xml/note.xml']     itertag = 'note'      def parse_node(self, response, selector):         item = crawleritem()         item['to'] = selector.xpath('//to/text()').extract()         item['who'] = selector.xpath('//from/text()').extract()         item['heading'] = selector.xpath('//heading/text()').extract()         item['body'] = selector.xpath('//body/text()').extract()         return item 

any appreciated. want same output without redundant tags.

the extract() method return list of values, if there single value result, example: [4], [3,4,5] or none. avoid this, if know there 1 value, can select like:

item['to'] = selector.xpath('//to/text()').extract()[0] 

note: aware can result in exception thrown in case extract() returns none , trying index that. in such uncertain cases, trick use:

item['to'] = (selector.xpath('...').extract() or [''])[0] 

or write custom function first element:

def extract_first(selector, default=none):     val = selector.extract()     return val[0] if val else default 

this way can have default value in case desired value not found:

item['to'] = extract_first(selector.xpath(...))  # first or none item['to'] = extract_first(selector.xpath(...), 'not-found')  # first of 'not-found' 

Comments

Popular posts from this blog

shopping cart - Page redirect not working PHP -

php - How to modify a menu to show sub-menus -

python - Installing PyDev in eclipse is failed -