Python Forum
[Python 3] - Extract specific data from a web page using lxml module
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Python 3] - Extract specific data from a web page using lxml module
#10
Using xpath() method of ElementTree you could query all td elements without span child like this:

from lxml import html

html_text = """<html>
  <head>
    <title>foo</title>
  </head>
  <body>
    <tr>
      <td><span class="number blue">xx</span></td>
      <td>001</td>
      <td>002</td>
    </tr>>
  </body>
</html>"""


et = html.fromstring(html_text)
spans = et.xpath('//tr/td/span[@class="number blue"]')
print(spans[0].text)
for e in et.xpath('//tr/td[not(span)]'):
    print(e.text)
Reply


Messages In This Thread
RE: [Python 3] - Extract specific data from a web page using lxml module - by leotrubach - Aug-25-2018, 08:46 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  trying to save data automatically from this page thunderspeed 1 2,009 Sep-19-2021, 04:57 AM
Last Post: ndc85430
  Extract data from sports betting sites nestor 3 5,635 Mar-30-2021, 04:37 PM
Last Post: Larz60+
  Scraping a page with log in data (security, proxies) iamaghost 0 2,144 Mar-27-2021, 02:56 PM
Last Post: iamaghost
  DJANGO Looping Through Context Variable with specific data Taz 0 1,821 Feb-18-2021, 03:52 PM
Last Post: Taz
  Beautiful Soap can't find a specific section on the page Pavel_47 1 2,425 Jan-18-2021, 02:18 PM
Last Post: snippsat
  Extract data from a table Bob_M 3 2,672 Aug-14-2020, 03:36 PM
Last Post: Bob_M
  Extract data with Selenium and BeautifulSoup nestor 3 3,914 Jun-06-2020, 01:34 AM
Last Post: Larz60+
  Extract json-ld schema markup data and store in MongoDB Nuwan16 0 2,457 Apr-05-2020, 04:06 PM
Last Post: Nuwan16
  Extract data from a webpage cycloneseb 5 2,879 Apr-04-2020, 10:17 AM
Last Post: alekson
  use Xpath in Python :: libxml2 for a page-to-page skip-setting apollo 2 3,631 Mar-19-2020, 06:13 PM
Last Post: apollo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020