Mar-27-2021, 05:18 PM
Hi all,
i have an issue crawling results from google. The first results are extracted correctly, but the 8th result throws an exception.
This is my Spider:
The error says: UnicodeEncodeError: 'charmap' codec can't encode characters in position 29-34: character maps to <undefined>
anyone who can help me with that?
thx,
kon
i have an issue crawling results from google. The first results are extracted correctly, but the 8th result throws an exception.
This is my Spider:
#GoogleSpider class GoogleSpider(scrapy.Spider): name = "GoogleSpider" start_urls = ["https://www.google.com/search?q=journal+dev"] def parse(self, response): xlink = LinkExtractor() link_list=[] link_text=[] divs = response.xpath('//div') text_list=[] for span in divs.xpath('text()'): if len(str(span.get()))>100: text_list.append(span.get()) for link in xlink.extract_links(response): if len(str(link))>200 or 'Journal' in link.text: print(len(str(link)),link.text,link,"\n") link_list.append(link) link_text.append(link.text) for i in range(len(link_text)-len(text_list)): text_list.append(" ")the link that causes the error has link.text = "_ElementUnicodeResult: Pankaj Kumar (@JournalDev) | টুইটার - Twittertwitter.com › journaldev"
The error says: UnicodeEncodeError: 'charmap' codec can't encode characters in position 29-34: character maps to <undefined>
anyone who can help me with that?
thx,
kon