Python Forum
[Help]xpath is not working with lxml
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Help]xpath is not working with lxml
#1
Hi All,

I am trying to collect some info from a website
I use xpath for a specific entry but it doesn't seem to work.
from lxml import html
import requests
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:48.0) Gecko/20100101 Firefox/48.0"

url='https://finviz.com/quote.ashx?t=intc'
url_get = requests.get(url,headers=headers)
tree = html.fromstring(url_get.content)
x = '/html/body/table[3]/tbody/tr[1]/td/table/tbody/tr[7]/td/table/tbody/tr[3]/td[6]/b'
lxml_soup = tree.xpath(x)
print(lxml_soup)
unfortunately nothing is printing:
Output:
[]
Reply
#2
Just copying an xpath from your browser's developer tools is rarely gonna work, at least reliably.
I would recommend learning the basics of xpath, and writing xpath expressions yourself.

Here are a couple of ways you can get what you want (this is a scrapy shell session, but the same xpaths will work in lxml):
>>> # the 6th cell of the 3rd row of the "snapshot-table2"
>>> response.xpath('//table[@class="snapshot-table2"]//tr[3]/td[6]/b/text()').get()
'0.95'
>>> # the cell after the one containing the text "EPS next Q"
>>> response.xpath('//td[.="EPS next Q"]/following-sibling::td[1]/b/text()').get()
Reply
#3
Many thanks for the help.

is there any reason why it is not reliable ?
it was working with me more than perfect with Selenium
Reply
#4
Basically, they are way too specific.
You get an XPath that selects a single element, mostly identifying it by its location in the DOM (a b inside the 6th td inside of the 3rd tr inside a tbody of a table inside of a td...).

This means that any simple change in the website will break the XPath, so it might not even work on two pages of the same website.

Another problem is that these XPaths are generated after the web page has been fully rendered, so the browser takes into account any javascript code that was executed (which lxml doesn't do), and in cases of invalid HTML, the browser might add/remove/move some elements (which depends on the browser).
The reason this particular XPath worked with selenium is that selenium probably used the same browser you normally use.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  cleaning HTML pages using lxml and XPath wenkos 2 2,318 Aug-25-2021, 10:54 AM
Last Post: wenkos
  need help with xpath pythonprogrammer 1 2,731 Jan-18-2020, 11:28 PM
Last Post: snippsat
  working with lxml and requests gentoobob 23 11,362 Apr-19-2018, 06:54 PM
Last Post: gentoobob

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020