Jul-22-2018, 04:10 PM
Basically, they are way too specific.
You get an XPath that selects a single element, mostly identifying it by its location in the DOM (a
This means that any simple change in the website will break the XPath, so it might not even work on two pages of the same website.
Another problem is that these XPaths are generated after the web page has been fully rendered, so the browser takes into account any javascript code that was executed (which lxml doesn't do), and in cases of invalid HTML, the browser might add/remove/move some elements (which depends on the browser).
The reason this particular XPath worked with selenium is that selenium probably used the same browser you normally use.
You get an XPath that selects a single element, mostly identifying it by its location in the DOM (a
b
inside the 6th td
inside of the 3rd tr
inside a tbody
of a table
inside of a td
...).This means that any simple change in the website will break the XPath, so it might not even work on two pages of the same website.
Another problem is that these XPaths are generated after the web page has been fully rendered, so the browser takes into account any javascript code that was executed (which lxml doesn't do), and in cases of invalid HTML, the browser might add/remove/move some elements (which depends on the browser).
The reason this particular XPath worked with selenium is that selenium probably used the same browser you normally use.