[Help]xpath is not working with lxml

mr_byte31 · Jul-22-2018, 10:14 AM

Hi All,

I am trying to collect some info from a website
I use xpath for a specific entry but it doesn't seem to work.

from lxml import html
import requests
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:48.0) Gecko/20100101 Firefox/48.0"

url='https://finviz.com/quote.ashx?t=intc'
url_get = requests.get(url,headers=headers)
tree = html.fromstring(url_get.content)
x = '/html/body/table[3]/tbody/tr[1]/td/table/tbody/tr[7]/td/table/tbody/tr[3]/td[6]/b'
lxml_soup = tree.xpath(x)
print(lxml_soup)

unfortunately nothing is printing:

Output:
[]

***stranac*** · (This post was last modified: Jul-22-2018, 11:21 AM by stranac.)

Just copying an xpath from your browser's developer tools is rarely gonna work, at least reliably.
I would recommend learning the basics of xpath, and writing xpath expressions yourself.

Here are a couple of ways you can get what you want (this is a scrapy shell session, but the same xpaths will work in lxml):

>>> # the 6th cell of the 3rd row of the "snapshot-table2"
>>> response.xpath('//table[@class="snapshot-table2"]//tr[3]/td[6]/b/text()').get()
'0.95'
>>> # the cell after the one containing the text "EPS next Q"
>>> response.xpath('//td[.="EPS next Q"]/following-sibling::td[1]/b/text()').get()

mr_byte31 · Jul-22-2018, 11:36 AM

Many thanks for the help.

is there any reason why it is not reliable ?
it was working with me more than perfect with Selenium

***stranac*** · Jul-22-2018, 04:10 PM

Basically, they are way too specific.
You get an XPath that selects a single element, mostly identifying it by its location in the DOM (a b inside the 6th td inside of the 3rd tr inside a tbody of a table inside of a td...).

This means that any simple change in the website will break the XPath, so it might not even work on two pages of the same website.

Another problem is that these XPaths are generated after the web page has been fully rendered, so the browser takes into account any javascript code that was executed (which lxml doesn't do), and in cases of invalid HTML, the browser might add/remove/move some elements (which depends on the browser).
The reason this particular XPath worked with selenium is that selenium probably used the same browser you normally use.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	cleaning HTML pages using lxml and XPath	wenkos	2	4,109	Aug-25-2021, 10:54 AM Last Post: wenkos
	need help with xpath	pythonprogrammer	1	3,405	Jan-18-2020, 11:28 PM Last Post: snippsat
	working with lxml and requests	gentoobob	23	15,804	Apr-19-2018, 06:54 PM Last Post: gentoobob

[Help]xpath is not working with lxml

User Panel Messages

Announcements