Python Forum
Web crawler extracting specific text from HTML
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Web crawler extracting specific text from HTML
#1
Hi - I've just started to learn how to use python and am exploring the elements of a web crawler. I'm trying to extract the text that follows "Licences" from this page (in this instance, I would like the result to be 'COPD Licence' for example).

So far I have the basics:
import requests
from bs4 import BeautifulSoup

result = requests.get(
    "https://www.rightbreathe.com/medicines/eklira-322microgramsdose-genuair-astrazeneca-uk-ltd-60-dose/?s=")
src = result.content
soup = BeautifulSoup(src, 'html.parser')
but then I'm struggling to successfully define the specific element that I'm going after - can anyone help please?
Reply
#2
Can use CSS selector here as many of class name are the same.
>>> soup.select('div.MedicineDeviceProduct-detail > div > div:nth-child(2) > div > span > ul > li')
[<li>COPD Licence</li>]
>>> soup.select_one('div.MedicineDeviceProduct-detail > div > div:nth-child(2) > div > span > ul > li')
<li>COPD Licence</li>
>>> soup.select_one('div.MedicineDeviceProduct-detail > div > div:nth-child(2) > div > span > ul > li').text
'COPD Licence'
You can find selector in browser and copy it.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python Obstacles | Karate | HTML/Scrape Specific Tag and Store it in MariaDB BrandonKastning 8 3,063 Nov-22-2021, 01:38 AM
Last Post: BrandonKastning
  How to get specific TD text via Selenium? euras 3 8,578 May-14-2021, 05:12 PM
Last Post: snippsat
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,483 Mar-14-2021, 12:23 PM
Last Post: rfeyer
Smile Extracting the Address tag from multiple HTML files using BeautifulSoup Dredd 8 4,741 Jan-25-2021, 12:16 PM
Last Post: Dredd
  Any way to remove HTML tags from scraped data? (I want text only) SeBz2020uk 1 3,389 Nov-02-2020, 08:12 PM
Last Post: Larz60+
  Extracting html data using attributes WiPi 14 5,303 May-04-2020, 02:04 PM
Last Post: snippsat
  Help extracting text from element jpdallas 7 3,357 Apr-30-2020, 06:26 AM
Last Post: anbu23
  Extracting Text in a canvas using chain actions law 3 2,244 Apr-22-2020, 11:45 AM
Last Post: Larz60+
  Web Crawler help Mr_Mafia 2 1,837 Apr-04-2020, 07:20 PM
Last Post: Mr_Mafia
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,316 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020