Python Forum

Hi All,

I am trying to collect some info from a website
I use xpath for a specific entry but it doesn't seem to work.

from lxml import html
import requests
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:48.0) Gecko/20100101 Firefox/48.0"

url='https://finviz.com/quote.ashx?t=intc'
url_get = requests.get(url,headers=headers)
tree = html.fromstring(url_get.content)
x = '/html/body/table[3]/tbody/tr[1]/td/table/tbody/tr[7]/td/table/tbody/tr[3]/td[6]/b'
lxml_soup = tree.xpath(x)
print(lxml_soup)

unfortunately nothing is printing:

Output:
[]

Just copying an xpath from your browser's developer tools is rarely gonna work, at least reliably.
I would recommend learning the basics of xpath, and writing xpath expressions yourself.

Here are a couple of ways you can get what you want (this is a scrapy shell session, but the same xpaths will work in lxml):

>>> # the 6th cell of the 3rd row of the "snapshot-table2"
>>> response.xpath('//table[@class="snapshot-table2"]//tr[3]/td[6]/b/text()').get()
'0.95'
>>> # the cell after the one containing the text "EPS next Q"
>>> response.xpath('//td[.="EPS next Q"]/following-sibling::td[1]/b/text()').get()

Many thanks for the help.

is there any reason why it is not reliable ?
it was working with me more than perfect with Selenium

Basically, they are way too specific.
You get an XPath that selects a single element, mostly identifying it by its location in the DOM (a b inside the 6th td inside of the 3rd tr inside a tbody of a table inside of a td...).

This means that any simple change in the website will break the XPath, so it might not even work on two pages of the same website.

Another problem is that these XPaths are generated after the web page has been fully rendered, so the browser takes into account any javascript code that was executed (which lxml doesn't do), and in cases of invalid HTML, the browser might add/remove/move some elements (which depends on the browser).
The reason this particular XPath worked with selenium is that selenium probably used the same browser you normally use.

mr_byte31

stranac

mr_byte31

stranac