Jan-04-2020, 06:19 PM
from urllib.request import urlopen from bs4 import BeautifulSoup import re pages=set() def getLinks(pageUrl): global pages html=urlopen("https://heppa.hippos.fi"+pageUrl) bsobj=BeautifulSoup(html, 'lxml') for link in bsobj.findAll("a", href=re.compile("^(/heppa/)")): if 'href' in link.attrs: if link.attrs['href'] not in pages: newPage=link.attrs['href'] print(newPage) pages.add(newPage) getLinks(newPage) getLinks("")
I'm new in Python and web scraping. I found this code somewhere. Trying modify code so I can save links to file, but I cant.
Please help me.
Thanks in advance.
