![]() |
Web crawler extracting specific text from HTML - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Web crawler extracting specific text from HTML (/thread-23522.html) |
Web crawler extracting specific text from HTML - lewdow - Jan-03-2020 Hi - I've just started to learn how to use python and am exploring the elements of a web crawler. I'm trying to extract the text that follows "Licences" from this page (in this instance, I would like the result to be 'COPD Licence' for example). So far I have the basics: import requests from bs4 import BeautifulSoup result = requests.get( "https://www.rightbreathe.com/medicines/eklira-322microgramsdose-genuair-astrazeneca-uk-ltd-60-dose/?s=") src = result.content soup = BeautifulSoup(src, 'html.parser')but then I'm struggling to successfully define the specific element that I'm going after - can anyone help please? RE: Web crawler extracting specific text from HTML - snippsat - Jan-03-2020 Can use CSS selector here as many of class name are the same. >>> soup.select('div.MedicineDeviceProduct-detail > div > div:nth-child(2) > div > span > ul > li') [<li>COPD Licence</li>] >>> soup.select_one('div.MedicineDeviceProduct-detail > div > div:nth-child(2) > div > span > ul > li') <li>COPD Licence</li> >>> soup.select_one('div.MedicineDeviceProduct-detail > div > div:nth-child(2) > div > span > ul > li').text 'COPD Licence'You can find selector in browser and copy it. |