Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Simple For Loop Question
#1
The purpose of the program is to go to a website, make a list of the links, then go to each link and send back the canonical tag. More or less, it works fine. However, I do have one issue: when there is no canonical tag, the prog simply moves on instead of printing "none" or something akin. How would I remedy this? any ideas?

from bs4 import BeautifulSoup
import requests
import re

site = requests.get('http://www.angelfire.com/comics/gameroom/').text
qqq = BeautifulSoup(site, 'html.parser')

for item in qqq.findAll('a', attrs={'href': re.compile("^http://")}):
    listoflinks = (item.get('href').split())
    print("link= ", listoflinks)
    for x in listoflinks:
        sit = requests.get((x)).text
        ppp = BeautifulSoup(sit, 'html.parser')
        for y in ppp.findAll('link',{"rel":"canonical"}):
            lll = (y.get('href').split())
            print(" canonical= ", lll)
Reply
#2
Nevermind. In case anyone sees this in future, these are the changes I made:
from bs4 import BeautifulSoup
import requests
import re

site = requests.get('http://www.angelfire.com/comics/gameroom/').text
qqq = BeautifulSoup(site, 'html.parser')

for item in qqq.findAll('a', attrs={'href': re.compile("^http://")}):
    listoflinks = (item.get('href').split())
    print("Original Link= ", listoflinks)
    for x in listoflinks:
        sit = requests.get((x)).text
        ppp = BeautifulSoup(sit, 'html.parser')
        y  = ppp.find_all('link',{"rel":"canonical"})
        lll = [t.get('href') for t in y]
        if len(lll) == 0:
            print("Canonical Link = N/A")
        else:
            print("Canonical Link=", lll)
Reply
#3
Thanks for sharing :)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Helping out a friend - simple question Nector33 5 2,790 Mar-30-2019, 08:40 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020