Info: Crawled 0 Pages (at 0 Pages/min), Scraped 0 Items (at 0 Items/min)

September 27, 2023 Post a Comment

I just began to learn Python and Scrapy. My first project is to crawl information on a website containing web security information. But when I run that using cmd, it says that 'Cra

Solution 1:

There are two things to correct to make it work:

You need to define FEED_URI setting with the path you want to store the result
You need to use response in parse_subpage because the logic is the following: scrapy downloads "https://www.imovirtual.com/arrendar/apartamento/lisboa/" and gives the response toparse, you extract ads url and you ask scrapy to download each pages and give the downloaded pages toparse_subpage. Soresponseinparse_subpage` corresponds to this https://www.imovirtual.com/anuncio/t0-totalmente-remodelado-localizacao-excelente-IDGBAY.html#913474cdaa for example

This should work:

import scrapy


classSapoSpider(scrapy.Spider):
    name = "imo"
    allowed_domains = ["imovirtual.com"]
    start_urls = ["https://www.imovirtual.com/arrendar/apartamento/lisboa/"]
    custom_settings = {
        'FEED_URI': './output.json'
    }
    defparse(self,response):
        subpage_links = []
        for i in response.css('div.offer-item-details'):
            youritem = {
            'preco':i.css('span.offer-item title::text').extract_first(),
            'autor':i.css('li.offer-item-price::text').extract(),
            'data':i.css('li.offer-item-area::text').extract(),
            'data_2':i.css('li.offer-item-price-perm::text').extract()
            }
            subpage_link = i.css('header[class=offer-item-header] a::attr(href)').extract()
            subpage_links.extend(subpage_link)

            for subpage_link insubpage_links:yield scrapy.Request(subpage_link, callback=self.parse_subpage, meta={'item':youritem})

    defparse_subpage(self,response):
        youritem = response.meta.get('item')
        youritem['info'] = response.css(' ul.dotted-list, li.h4::text').extract()
        yield youritem

Python Programming Language

Info: Crawled 0 Pages (at 0 Pages/min), Scraped 0 Items (at 0 Items/min)

Solution 1:

Post a Comment for "Info: Crawled 0 Pages (at 0 Pages/min), Scraped 0 Items (at 0 Items/min)"