Oct-22-2016, 07:53 PM
Here is the code:
Thank you
from urllib.request import urlopen from bs4 import BeautifulSoup import re pages = set() def getLinks(pageUrl): global pages html = urlopen("http://en.wikipedia.org"+pageUrl) try: print(bsObj.h1.get_text()) print(bsObj.find(id ="mw-content-text").findAll("p")[0]) print(bsObj.find(id="ca-edit").find("span").find("a").attrs['href']) except AttributeError: print("This page is missing something! No worries though!") for link in bsObj.findAll("a", href=re.compile("^(/wiki/)")): if 'href' in link.attrs: if link.attrs['href'] not in pages: #We have encountered a new page newPage = link.attrs['href'] print("----------------\n"+newPage) pages.add(newPage) getLinks(newPage) getLinks("")That it, I found this on the web and change some of it, the findAll is defined I think = print(bsObj.find(id ="mw-content-text").findAll("p")[0])
Thank you