Nov-13-2020, 09:37 AM
Hello,
The following code is about web spider, it is from a python tutorial, but my question is irrelevnat to web spider, but a general question of python.
I'd like to know the usage of format() in the following code.
The usage of format I learnt is string with {} as placeholder and joined by format(argument, argument...).
e.g. My name is {}, I am (). format('peter', 18)
I don't understand #18, the usage of format(self.url%self.pagenum). what's the function of % in #18?
(The purpose of the format is to use the url template to join the webpage number, so create a new url)
The following code is about web spider, it is from a python tutorial, but my question is irrelevnat to web spider, but a general question of python.
I'd like to know the usage of format() in the following code.
The usage of format I learnt is string with {} as placeholder and joined by format(argument, argument...).
e.g. My name is {}, I am (). format('peter', 18)
I don't understand #18, the usage of format(self.url%self.pagenum). what's the function of % in #18?
(The purpose of the format is to use the url template to join the webpage number, so create a new url)
# -*- coding: utf-8 -*- import scrapy class XiaohuaSpider(scrapy.Spider): name = 'xiaohua' # allowed_domains = ['www.xxx.com'] start_urls = ['http://www.521609.com/meinvxiaohua/'] url = 'http://www.521609.com/meinvxiaohua/list12%d.html' page_num = 2 def parse(self, r.esponse): li_list = response.xpath('//*[@id="content"]/div[2]/div[2]/ul/li') for li in li_list: img_name = li.xpath('./a[2]/b/text() | ./a[2]/text()').extract_first() print(img_name) if self.page_num <= 11: new_url = format(self.url%self.page_num) self.page_num += 1 yield scrapy.Request(url=new_url,callback=self.parse)