web crawler - Scrapy python error - Missing scheme in request URL -


i'm trying pull file password protected ftp server. code i'm using:

import scrapy scrapy.contrib.spiders import xmlfeedspider scrapy.http import request crawler.items import crawleritem  class sitespider(xmlfeedspider):     name = 'site'     allowed_domains = ['ftp.site.co.uk']     itertag = 'item'      def start_requests(self):         yield request('ftp.site.co.uk/feed.xml',             meta={'ftp_user': 'test', 'ftp_password': 'test'})      def parse_node(self, response, selector):         item = crawleritem()         item['title'] = (selector.xpath('//title/text()').extract() or [''])[0]               return item 

this traceback error get:

        traceback (most recent call last):                                                                         file "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 1192, in run                  self.mainloop()                                                                                        file "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 1201, in mainloop             self.rununtilcurrent()                                                                                 file "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 824, in rununtilc urrent                                                                                                               call.func(*call.args, **call.kw)                                                                       file "/usr/local/lib/python2.7/dist-packages/scrapy/utils/reactor.py", line 41, in __call__                return self._func(*self._a, **self._kw)                                                              --- <exception caught here> ---                                                                            file "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 112, in _next_reques t                                                                                                                    request = next(slot.start_requests)                                                                    file "/var/www/spider/crawler/spiders/site.py", line 13, in start_requests                              meta={'ftp_user': 'test', 'ftp_password': 'test'})                                           file "/usr/local/lib/python2.7/dist-packages/scrapy/http/request/__init__.py", line 26, in __i nit__                                                                                                                self._set_url(url)                                                                                     file "/usr/local/lib/python2.7/dist-packages/scrapy/http/request/__init__.py", line 61, in _se t_url                                                                                                                raise valueerror('missing scheme in request url: %s' % self._url)                                    exceptions.valueerror: missing scheme in request url: ftp.site.co.uk/f eed.xml   

you need add scheme url:

ftp://ftp.site.co.uk 

the ftp url syntax defined as:

ftp://[<user>[:<password>]@]<host>[:<port>]/<url-path> 

basically, this:

yield request('ftp://ftp.site.co.uk/feed.xml', ...) 

read more schemas @ wikipedia: http://en.wikipedia.org/wiki/uri_scheme


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -