Python Forum
Using BeautifulSoup: Getting only First Result. Also, trouble with nesting.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Using BeautifulSoup: Getting only First Result. Also, trouble with nesting.
#1
From this site https://azure.microsoft.com/en-us/products/ I'm trying to extract all products and product descriptions per category so that category/category description repeats as many times are there are products under it. I am focusing on the main body of the page, not the left nav bar.

I have two problems.
1. how to extract all 21 categories? There are 21 categories starting with' AI = machine learning' and ending with 'Web' and within each category are 10-20 products with their descriptions. I get only the 1rst.
1. how to nest the extraction of the products and product description for each category and category desc?

Here is my code: (unable to put inside a code box, that feature is not working)

import requests
from bs4 import BeautifulSoup
import pandas

url="https://azure.microsoft.com/en-us/services/"

response = requests.get(url)
soup=BeautifulSoup(response.content, features="html.parser")

#wish to extract all 21 categories and category descriptions - why does it return only the first?
for div in soup.find_all('div',id='products-list'):
    header = div.find('h2').text
    print(header)
    head_desc = div.find('p').text
    print(head_desc)

#wish to extract all products and their descriptions - this works but how to nest within the upper code?
for div in soup.find_all('div', class_='column medium-6 end'):
    product = div.find('span').text
    print(product)
    prod_desc = div.find('p').text
    print(prod_desc)
Thank you so much in advance and if any tips for using code format, pls let me know. I'm first time poster. polkadot.
Larz60+ write May-04-2023, 08:01 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Fixed for you this time. Please use BBCode tags on future posts.
Reply
#2
try:
import requests
from bs4 import BeautifulSoup

url="https://azure.microsoft.com/en-us/services/"

response = requests.get(url)
soup=BeautifulSoup(response.content, features="html.parser")

ul = soup.find('ul', {'id': 'popular-solutions-filter-list'})
lis = ul.find_all('li')
for li in lis:
    xurl = li.a.get('href')
    category = li.text.strip()
    url = f"https://azure.microsoft.com/en-us/products/{xurl}"
    print(f"category: {category}, url = {url}")
results
Output:
category: AI + machine learning, url = https://azure.microsoft.com/en-us/products/ category: Analytics, url = https://azure.microsoft.com/en-us/products/ category: Compute, url = https://azure.microsoft.com/en-us/products/ category: Containers, url = https://azure.microsoft.com/en-us/products/ category: Databases, url = https://azure.microsoft.com/en-us/products/ category: Developer tools, url = https://azure.microsoft.com/en-us/products/ category: DevOps, url = https://azure.microsoft.com/en-us/products/ category: Hybrid + multicloud, url = https://azure.microsoft.com/en-us/products/ category: Identity, url = https://azure.microsoft.com/en-us/products/ category: Integration, url = https://azure.microsoft.com/en-us/products/ category: Internet of Things, url = https://azure.microsoft.com/en-us/products/ category: Management and governance, url = https://azure.microsoft.com/en-us/products/ category: Media, url = https://azure.microsoft.com/en-us/products/ category: Migration, url = https://azure.microsoft.com/en-us/products/ category: Mixed reality, url = https://azure.microsoft.com/en-us/products/ category: Mobile, url = https://azure.microsoft.com/en-us/products/ category: Networking, url = https://azure.microsoft.com/en-us/products/ category: Security, url = https://azure.microsoft.com/en-us/products/ category: Storage, url = https://azure.microsoft.com/en-us/products/ category: Virtual desktop infrastructure, url = https://azure.microsoft.com/en-us/products/ category: Web, url = https://azure.microsoft.com/en-us/products/
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020