Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Soup('A')
#1
Hello,

I've been going through an introductory Python book that includes some material on web scraping using BeautifulSoup. My question is about the final three lines in the below code:

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = "https://docs.python.org"
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
     print(tag.get('href', None))
I understand what the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get() method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get indicates to me that tags is a list (the output starts with a square bracket and ends with a square bracket).

I tried looking through the BeautifulSoup documentation for any insights on this but am still unclear.

Thanks in advance for any help.
Reply
#2
I would have used: tags=soup.find_all('a')
Reply
#3
(Aug-31-2022, 12:23 PM)Larz60+ Wrote: I would have used: tags=soup.find_all('a')

Thanks Larz60+, that definitely makes more sense to me (and produces the same result).
Reply
#4
1.> soup('a') is not referring to a dictionary, but it's actually filtering and extracting all <a> tags from the parsed HTML, returning a ResultSet (a list-like object).
2.> The code tags = soup('a') assigns this list of Tag objects to the variable tags.
3.> When you print tags, it displays the representation of the ResultSet, which may look like a list.

Your understanding is correct: the square brackets indicate a list-like structure, but the actual content inside those brackets is a collection of Tag objects, not dictionary keys or values.
Reply
#5
Thanjs this cide
Reply
#6
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
print(tag.get('href', None))
I understand what the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get() method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get I understand ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONEwhat the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get indicates to me that tags is a list (the output starts with a square bracket and ends with a square bracket).
buran write Aug-10-2023, 08:15 AM:
Spam link removed
Reply
#7
# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
print(tag.get('href', None))
I understand what the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get() method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get I understand ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONEwhat the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get indicates to me that tags is a list (the output starts with a square bracket and ends with a square bracket).
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020