class - Python: Scrapy exports raw data instead of text() only? -
i'm exporting class:
class myspider(basespider): name = "dozen" allowed_domains = ["yahoo.com"] start_urls = ["http://finance.yahoo.com/q/is?s=scmp+income+statement&annual"] def parse(self, response): hxs = htmlxpathselector(response) revenue = hxs.select('//td[@align="right"]') items = [] rev in revenue: item = dozenitem() item["revenue"] = rev.xpath("./strong/text()") items.append(item) return items[:7]
and getting this:
[<htmlxpathselector xpath='./strong/text()' data=u'\n 115,450\xa0\xa0\n '>]
but want 115,450
.
if add .extract()
end of item["revenue"]
line, exports nothing.
here section of html includes i'm trying grab:
<tr> <td colspan="2"> <strong>total revenue</strong> </td> <td align="right"> <strong>115,450 </strong> </td> <td align="right"> <strong>89,594 </strong> </td> <td align="right"> <strong>81,487 </strong> </td> </tr>
you trying use broad of xpath expression first selection. try this:
def parse(self, response): revenue = response.xpath('//td[@align="right"]/strong/text()') items = [] rev in revenue: item = dozenitem() item["revenue"] = rev.re('\d*,\d*') items.append(item) return items[:3]
Comments
Post a Comment