Python Forum
Need help with XPath using requests,time,urllib.request and BeautifulSoup
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need help with XPath using requests,time,urllib.request and BeautifulSoup
#1
I have an xpath expression that I know works. Using the URL:
https://www.yellowpages.com/houston-tx/m...1657186981

and XPath:
//div[@class='sales-info']/H1[1]

Should return this:
Spector Ivan

My code is posted below. Can anyone please explain why it doesn't work here?
It works using scrapy, but I cannot mulit-thread in scrapy so I'm looking for an alternate.

Thanks.

import requests,time,urllib.request, concurrent.futures, pandas as pd  #proxy cheker < https://stackoverflow.com/questions/765305/proxy-check-in-python >
from bs4 import BeautifulSoup
import time
from lxml import html

url = 'https://www.yellowpages.com/houston-tx/mip/spector-ivan-11449879?lid=1001657186981'

proxy_handler = urllib.request.ProxyHandler({'http': '149.19.32.99:8082'})
opener = urllib.request.build_opener(proxy_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)

pg=urllib.request.urlopen(url) 

soup = BeautifulSoup(pg,'lxml')

tree = html.fromstring(soup.prettify())
testdata = tree.xpath("//div[@class='sales-info']/H1[1]")
print('XPath data: ', testdata)
Reply
#2
Maybe something more like...?

>>> tree.xpath("//div[@class='sales-info']/h1/text()")[0]
'\n        Spector  Ivan\n       '
Reply
#3
Thanks but that didn't do it:
IndexError: list index out of range
Reply
#4
Odd, I just changed that one line and it "works" for me.
...
#testdata = tree.xpath("//div[@class='sales-info']/H1[1]")
testdata = tree.xpath("//div[@class='sales-info']/h1/text()")[0]
print('XPath data: ', testdata)
Output:
XPath data: Spector Ivan
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  urllib can't find "parse" rjdegraff42 6 2,148 Jul-24-2023, 05:28 PM
Last Post: deanhystad
  Import requests/beautifulsoup problem Jokadaro_ 3 2,040 Dec-05-2021, 01:22 PM
Last Post: Jokadaro_
  how can I correct the Bad Request error on my curl request tomtom 8 5,057 Oct-03-2021, 06:32 AM
Last Post: tomtom
  Prevent urllib.request from using my local proxy spacedog 0 2,869 Apr-24-2021, 08:55 PM
Last Post: spacedog
  urllib.request.ProxyHandler works with bad proxy spacedog 0 5,913 Apr-24-2021, 08:02 AM
Last Post: spacedog
  Help with urllib.request Brian177 2 2,866 Apr-21-2021, 01:58 PM
Last Post: Brian177
  urllib.request ericmt123 2 2,430 Dec-21-2020, 06:53 PM
Last Post: Larz60+
  Cannot open url link using urllib.request Askic 5 6,666 Oct-25-2020, 04:56 PM
Last Post: Askic
  urllib is not a package traceback cc26 3 5,387 Aug-28-2020, 09:34 AM
Last Post: snippsat
  ImportError: cannot import name 'Request' from 'request' abhishek81py 1 3,921 Jun-18-2020, 08:07 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020