Web crawler extracting specific text from HTML

lewdow · Jan-03-2020, 09:58 PM

Hi - I've just started to learn how to use python and am exploring the elements of a web crawler. I'm trying to extract the text that follows "Licences" from this page (in this instance, I would like the result to be 'COPD Licence' for example).

So far I have the basics:

import requests
from bs4 import BeautifulSoup

result = requests.get(
    "https://www.rightbreathe.com/medicines/eklira-322microgramsdose-genuair-astrazeneca-uk-ltd-60-dose/?s=")
src = result.content
soup = BeautifulSoup(src, 'html.parser')

but then I'm struggling to successfully define the specific element that I'm going after - can anyone help please?

***snippsat*** · Jan-03-2020, 11:21 PM

Can use CSS selector here as many of class name are the same.

>>> soup.select('div.MedicineDeviceProduct-detail > div > div:nth-child(2) > div > span > ul > li')
[<li>COPD Licence</li>]
>>> soup.select_one('div.MedicineDeviceProduct-detail > div > div:nth-child(2) > div > span > ul > li')
<li>COPD Licence</li>
>>> soup.select_one('div.MedicineDeviceProduct-detail > div > div:nth-child(2) > div > span > ul > li').text
'COPD Licence'

You can find selector in browser and copy it.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Python Obstacles \| Karate \| HTML/Scrape Specific Tag and Store it in MariaDB	BrandonKastning	8	3,171	Nov-22-2021, 01:38 AM Last Post: BrandonKastning
	How to get specific TD text via Selenium?	euras	3	8,823	May-14-2021, 05:12 PM Last Post: snippsat
	HTML multi select HTML listbox with Flask/Python	rfeyer	0	4,649	Mar-14-2021, 12:23 PM Last Post: rfeyer
	Extracting the Address tag from multiple HTML files using BeautifulSoup	Dredd	8	4,949	Jan-25-2021, 12:16 PM Last Post: Dredd
	Any way to remove HTML tags from scraped data? (I want text only)	SeBz2020uk	1	3,478	Nov-02-2020, 08:12 PM Last Post: Larz60+
	Extracting html data using attributes	WiPi	14	5,509	May-04-2020, 02:04 PM Last Post: snippsat
	Help extracting text from element	jpdallas	7	3,528	Apr-30-2020, 06:26 AM Last Post: anbu23
	Extracting Text in a canvas using chain actions	law	3	2,330	Apr-22-2020, 11:45 AM Last Post: Larz60+
	Web Crawler help	Mr_Mafia	2	1,895	Apr-04-2020, 07:20 PM Last Post: Mr_Mafia
	Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row	BrandonKastning	0	2,371	Mar-22-2020, 06:10 AM Last Post: BrandonKastning

Web crawler extracting specific text from HTML

User Panel Messages

Announcements