web crawler - Scrapy python error - Missing scheme in request URL -
i'm trying pull file password protected ftp server. code i'm using:
import scrapy scrapy.contrib.spiders import xmlfeedspider scrapy.http import request crawler.items import crawleritem class sitespider(xmlfeedspider): name = 'site' allowed_domains = ['ftp.site.co.uk'] itertag = 'item' def start_requests(self): yield request('ftp.site.co.uk/feed.xml', meta={'ftp_user': 'test', 'ftp_password': 'test'}) def parse_node(self, response, selector): item = crawleritem() item['title'] = (selector.xpath('//title/text()').extract() or [''])[0] return item
this traceback error get:
traceback (most recent call last): file "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 1192, in run self.mainloop() file "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 1201, in mainloop self.rununtilcurrent() file "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 824, in rununtilc urrent call.func(*call.args, **call.kw) file "/usr/local/lib/python2.7/dist-packages/scrapy/utils/reactor.py", line 41, in __call__ return self._func(*self._a, **self._kw) --- <exception caught here> --- file "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 112, in _next_reques t request = next(slot.start_requests) file "/var/www/spider/crawler/spiders/site.py", line 13, in start_requests meta={'ftp_user': 'test', 'ftp_password': 'test'}) file "/usr/local/lib/python2.7/dist-packages/scrapy/http/request/__init__.py", line 26, in __i nit__ self._set_url(url) file "/usr/local/lib/python2.7/dist-packages/scrapy/http/request/__init__.py", line 61, in _se t_url raise valueerror('missing scheme in request url: %s' % self._url) exceptions.valueerror: missing scheme in request url: ftp.site.co.uk/f eed.xml
you need add scheme url:
ftp://ftp.site.co.uk
the ftp url syntax defined as:
ftp://[<user>[:<password>]@]<host>[:<port>]/<url-path>
basically, this:
yield request('ftp://ftp.site.co.uk/feed.xml', ...)
read more schemas @ wikipedia: http://en.wikipedia.org/wiki/uri_scheme
Comments
Post a Comment