Aug-31-2022, 12:09 PM
Hello,
I've been going through an introductory Python book that includes some material on web scraping using BeautifulSoup. My question is about the final three lines in the below code:
I tried looking through the BeautifulSoup documentation for any insights on this but am still unclear.
Thanks in advance for any help.
I've been going through an introductory Python book that includes some material on web scraping using BeautifulSoup. My question is about the final three lines in the below code:
import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = "https://docs.python.org" html = urllib.request.urlopen(url, context=ctx).read() soup = BeautifulSoup(html, 'html.parser') # Retrieve all of the anchor tags tags = soup('a') for tag in tags: print(tag.get('href', None))I understand what the code is doing (and it works on my computer) but I'm curious about what's going on in the line "tags = soup('a')." The get() method used in the for loop on the next two lines suggests that "soup('a')" is referring to a dictionary. But if that were the case, shouldn't the code be written with square brackets as "tags = soup['a']"? Also, when I print tags the output I get indicates to me that tags is a list (the output starts with a square bracket and ends with a square bracket).
I tried looking through the BeautifulSoup documentation for any insights on this but am still unclear.
Thanks in advance for any help.