Jul-08-2021, 11:32 AM
Hello all,
I'm new to Python and I'm trying to practice some webscraping by challenging myself to try to extract various elements from different websites. On this personal challenge, I've become stuck trying to extract the URL and the Anchor text from a ul list on a site (as shown below in the output). After hours trying to resolve this, I thought I would ask for some assistance please.
From what I've read, you need to create a for loop within a loop and although I've tried so many different variations- I must admit, I'm still confused.
I've been able to use the following 'for loop' to almost get the results I'm after:
So the results that I'm getting from this code is:
I'm new to Python and I'm trying to practice some webscraping by challenging myself to try to extract various elements from different websites. On this personal challenge, I've become stuck trying to extract the URL and the Anchor text from a ul list on a site (as shown below in the output). After hours trying to resolve this, I thought I would ask for some assistance please.
From what I've read, you need to create a for loop within a loop and although I've tried so many different variations- I must admit, I'm still confused.

I've been able to use the following 'for loop' to almost get the results I'm after:
from bs4 import BeautifulSoup import requests import pandas as pd url = [mytesturl] page = requests.get(url) soup = BeautifulSoup(page.text, 'html.parser') full_list = soup.findAll('ol', {'class': 'nav browse-group-list'}) for category in full_list: group_list = category.findAll('li') for weblink in group_list: url= weblink.findAll('a') print(url)
So the results that I'm getting from this code is:
Output:[<a href="/tour-operator-software/">Tour Operator Software</a>]
[<a href="/treasury-software/">Treasury Software</a>]
[<a href="/trucking-software/">Trucking Software</a>]
[<a href="/trust-accounting-software/">Trust Accounting Software</a>]
[<a href="/tutoring-software/">Tutoring Software</a>]
[<a href="/unified-communications-software/">Unified Communications Software </a>]
[<a href="/unified-endpoint-management-software/">Unified Endpoint Management (UEM) Software</a>]
[<a href="/url-shortener-software/">URL Shortener</a>]
[<a href="/user-testing-software/">User Testing Software</a>]
[<a href="/utility-billing-software/">Utility Billing Software</a>]
[<a href="/utility-management-systems-software/">Utility Management Systems Software</a>]
[<a href="/ux-software/">UX Software</a>]
[<a href="/vacation-rental-software/">Vacation Rental Software</a>]
[<a href="/vaccine-management-software/">Vaccine Management Software</a>]
[<a href="/vdi-software/">VDI Software</a>]
But I'm wanting to try and extract both the URL (for example :/vdi-software/) and also the anchor text (eg- VDI Software) but I've become stuck and unsure of what to use. Would really appreciate some assistance please.