Python Forum
[Python 3] - Extract specific data from a web page using lxml module
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Python 3] - Extract specific data from a web page using lxml module
#8
(Aug-23-2018, 07:01 PM)snippsat Wrote: Remove text() from Xpath,can use .text from lxml.
Now can also take out .attrib from CSS class.
from lxml import etree

# Simulate a web page
html = '''\
<html>
  <head>
    <title>foo</title>
  </head>
  <body>
    <tr>
      <td><span class="number blue">xx</span></td>
      <td>001</td>
      <td>002</td>
    </tr>>
  </body>
</html>'''

tree = etree.fromstring(html)
span_tag = tree.xpath("//span[@class='number blue']")
print(span_tag[0].text)
print(span_tag[0].attrib.get('class'))
Output:
xx number blue

Thanks for your reply. However, I want to get the two values (i.e. 001 and 002) within the <td> tags. They all belong to the same span class (i.e. number blue).

Any idea how to get these values neatly?
Reply


Messages In This Thread
RE: [Python 3] - Extract specific data from a web page using lxml module - by Takeshio - Aug-24-2018, 02:13 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  trying to save data automatically from this page thunderspeed 1 2,026 Sep-19-2021, 04:57 AM
Last Post: ndc85430
  Extract data from sports betting sites nestor 3 5,663 Mar-30-2021, 04:37 PM
Last Post: Larz60+
  Scraping a page with log in data (security, proxies) iamaghost 0 2,157 Mar-27-2021, 02:56 PM
Last Post: iamaghost
  DJANGO Looping Through Context Variable with specific data Taz 0 1,840 Feb-18-2021, 03:52 PM
Last Post: Taz
  Beautiful Soap can't find a specific section on the page Pavel_47 1 2,442 Jan-18-2021, 02:18 PM
Last Post: snippsat
  Extract data from a table Bob_M 3 2,700 Aug-14-2020, 03:36 PM
Last Post: Bob_M
  Extract data with Selenium and BeautifulSoup nestor 3 3,937 Jun-06-2020, 01:34 AM
Last Post: Larz60+
  Extract json-ld schema markup data and store in MongoDB Nuwan16 0 2,476 Apr-05-2020, 04:06 PM
Last Post: Nuwan16
  Extract data from a webpage cycloneseb 5 2,894 Apr-04-2020, 10:17 AM
Last Post: alekson
  use Xpath in Python :: libxml2 for a page-to-page skip-setting apollo 2 3,652 Mar-19-2020, 06:13 PM
Last Post: apollo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020