Python Forum

Full Version: Soup('A')
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I've been going through an introductory Python book that includes some material on web scraping using BeautifulSoup. My question is about the final three lines in the below code:

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = "https://docs.python.org"
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
     print(tag.get('href', None))
I understand what the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get() method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get indicates to me that tags is a list (the output starts with a square bracket and ends with a square bracket).

I tried looking through the BeautifulSoup documentation for any insights on this but am still unclear.

Thanks in advance for any help.
I would have used: tags=soup.find_all('a')
(Aug-31-2022, 12:23 PM)Larz60+ Wrote: [ -> ]I would have used: tags=soup.find_all('a')

Thanks Larz60+, that definitely makes more sense to me (and produces the same result).
1.> soup('a') is not referring to a dictionary, but it's actually filtering and extracting all <a> tags from the parsed HTML, returning a ResultSet (a list-like object).
2.> The code tags = soup('a') assigns this list of Tag objects to the variable tags.
3.> When you print tags, it displays the representation of the ResultSet, which may look like a list.

Your understanding is correct: the square brackets indicate a list-like structure, but the actual content inside those brackets is a collection of Tag objects, not dictionary keys or values.
Thanjs this cide
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
print(tag.get('href', None))
I understand what the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get() method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get I understand ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONEwhat the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get indicates to me that tags is a list (the output starts with a square bracket and ends with a square bracket).
# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
print(tag.get('href', None))
I understand what the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get() method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get I understand ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONEwhat the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get indicates to me that tags is a list (the output starts with a square bracket and ends with a square bracket).