Apr-24-2021, 12:32 AM
I have an xpath expression that I know works. Using the URL:
https://www.yellowpages.com/houston-tx/m...1657186981
and XPath:
//div[@class='sales-info']/H1[1]
Should return this:
Spector Ivan
My code is posted below. Can anyone please explain why it doesn't work here?
It works using scrapy, but I cannot mulit-thread in scrapy so I'm looking for an alternate.
Thanks.
https://www.yellowpages.com/houston-tx/m...1657186981
and XPath:
//div[@class='sales-info']/H1[1]
Should return this:
Spector Ivan
My code is posted below. Can anyone please explain why it doesn't work here?
It works using scrapy, but I cannot mulit-thread in scrapy so I'm looking for an alternate.
Thanks.
import requests,time,urllib.request, concurrent.futures, pandas as pd #proxy cheker < https://stackoverflow.com/questions/765305/proxy-check-in-python > from bs4 import BeautifulSoup import time from lxml import html url = 'https://www.yellowpages.com/houston-tx/mip/spector-ivan-11449879?lid=1001657186981' proxy_handler = urllib.request.ProxyHandler({'http': '149.19.32.99:8082'}) opener = urllib.request.build_opener(proxy_handler) opener.addheaders = [('User-agent', 'Mozilla/5.0')] urllib.request.install_opener(opener) pg=urllib.request.urlopen(url) soup = BeautifulSoup(pg,'lxml') tree = html.fromstring(soup.prettify()) testdata = tree.xpath("//div[@class='sales-info']/H1[1]") print('XPath data: ', testdata)