Nov-13-2020, 09:37 AM
Hello,
The following code is about web spider, it is from a python tutorial, but my question is irrelevnat to web spider, but a general question of python.
I'd like to know the usage of format() in the following code.
The usage of format I learnt is string with {} as placeholder and joined by format(argument, argument...).
e.g. My name is {}, I am (). format('peter', 18)
I don't understand #18, the usage of format(self.url%self.pagenum). what's the function of % in #18?
(The purpose of the format is to use the url template to join the webpage number, so create a new url)
The following code is about web spider, it is from a python tutorial, but my question is irrelevnat to web spider, but a general question of python.
I'd like to know the usage of format() in the following code.
The usage of format I learnt is string with {} as placeholder and joined by format(argument, argument...).
e.g. My name is {}, I am (). format('peter', 18)
I don't understand #18, the usage of format(self.url%self.pagenum). what's the function of % in #18?
(The purpose of the format is to use the url template to join the webpage number, so create a new url)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# -*- coding: utf-8 -*- import scrapy class XiaohuaSpider(scrapy.Spider): name = 'xiaohua' # allowed_domains = ['www.xxx.com'] page_num = 2 def parse( self , r.esponse): li_list = response.xpath( '//*[@id="content"]/div[2]/div[2]/ul/li' ) for li in li_list: img_name = li.xpath( './a[2]/b/text() | ./a[2]/text()' ).extract_first() print (img_name) if self .page_num < = 11 : new_url = format ( self .url % self .page_num) self .page_num + = 1 yield scrapy.Request(url = new_url,callback = self .parse) |