Nov-23-2017, 02:52 AM
The purpose of the program is to go to a website, make a list of the links, then go to each link and send back the canonical tag. More or less, it works fine. However, I do have one issue: when there is no canonical tag, the prog simply moves on instead of printing "none" or something akin. How would I remedy this? any ideas?
from bs4 import BeautifulSoup import requests import re site = requests.get('http://www.angelfire.com/comics/gameroom/').text qqq = BeautifulSoup(site, 'html.parser') for item in qqq.findAll('a', attrs={'href': re.compile("^http://")}): listoflinks = (item.get('href').split()) print("link= ", listoflinks) for x in listoflinks: sit = requests.get((x)).text ppp = BeautifulSoup(sit, 'html.parser') for y in ppp.findAll('link',{"rel":"canonical"}): lll = (y.get('href').split()) print(" canonical= ", lll)