Soup('A')

new_coder_231013 · Aug-31-2022, 12:09 PM

Hello,

I've been going through an introductory Python book that includes some material on web scraping using BeautifulSoup. My question is about the final three lines in the below code:

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = "https://docs.python.org"
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
     print(tag.get('href', None))

I understand what the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get() method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get indicates to me that tags is a list (the output starts with a square bracket and ends with a square bracket).

I tried looking through the BeautifulSoup documentation for any insights on this but am still unclear.

Thanks in advance for any help.

**Larz60+** · Aug-31-2022, 12:23 PM

I would have used: tags=soup.find_all('a')

new_coder_231013 · Sep-12-2022, 12:02 PM

(Aug-31-2022, 12:23 PM)Larz60+ Wrote: I would have used: tags=soup.find_all('a')

Thanks Larz60+, that definitely makes more sense to me (and produces the same result).

Gaurav_Kumar · Aug-09-2023, 11:49 AM

1.> soup('a') is not referring to a dictionary, but it's actually filtering and extracting all <a> tags from the parsed HTML, returning a ResultSet (a list-like object).
2.> The code tags = soup('a') assigns this list of Tag objects to the variable tags.
3.> When you print tags, it displays the representation of the ResultSet, which may look like a list.

Your understanding is correct: the square brackets indicate a list-like structure, but the actual content inside those brackets is a collection of Tag objects, not dictionary keys or values.

Pubfonts · Aug-09-2023, 01:11 PM

Thanjs this cide

Pubfonts · (This post was last modified: Aug-10-2023, 09:25 AM by Pubfonts.)

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
print(tag.get('href', None))
I understand what the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get() method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get I understand ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONEwhat the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get indicates to me that tags is a list (the output starts with a square bracket and ends with a square bracket).

buran write Aug-10-2023, 08:15 AM:
Spam link removed

Pubfonts · Aug-12-2023, 10:55 AM

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
print(tag.get('href', None))
I understand what the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get() method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get I understand ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONEwhat the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get indicates to me that tags is a list (the output starts with a square bracket and ends with a square bracket).

Soup('A')

User Panel Messages

Announcements