Python Forum
Web Scraping on href text - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Web Scraping on href text (/thread-22461.html)

Pages: 1 2


RE: Web Scraping on href text - Superzaffo - Nov-15-2019

Ok. No problem.
I'm in no hurry.
Thank you

:-D


RE: Web Scraping on href text - Superzaffo - Nov-16-2019

I wrote this code.. From our exaple (Thank you)
from bs4 import BeautifulSoup
import requests
 
 
class ScrapeOrchids:
    def __init__(self):
        self.main_url = 'http://www.orchidspecies.com/indexe-ep.htm'
        self.links = {}
        self.get_initial_list()
        self.show_links()
     
    def get_initial_list(self):
        baseurl = 'http://www.orchidspecies.com/'
        response = requests.get(self.main_url)
        if response.status_code == 200:
            page = response.content
            soup = BeautifulSoup(page, 'lxml')
            # css_select link can be found using browser inspect element, then right click-->Copy-->CSS_Selector
            for i in soup.select("li"):
                 #print(i.a.text)
                if 'Epiblastus lancipetalus' in i.a.text:
                    #print(i.a.get('href'))
                    self.links[i.a.text.strip()] = f"{baseurl}{i.a.get('href')}"
          
        else:
            print(f"Problem fetching {self.main_url}")
 
    def show_links(self):
        for key, value in self.links.items():
            print(f"{key}: {value}")
 
 
if __name__ == '__main__':
    ScrapeOrchids()
this is the result
Output:
Epiblastus lancipetalus Schltr. 1911: http://www.orchidspecies.com/epiblancipetalus.htm
and is what I want.
Now I need to get the new link and in the page save the image of the orchid in a excel file. :-(