Saving links as text - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Saving links as text (/thread-23538.html) |
Saving links as text - hessu - Jan-04-2020 from urllib.request import urlopen from bs4 import BeautifulSoup import re pages=set() def getLinks(pageUrl): global pages html=urlopen("https://heppa.hippos.fi"+pageUrl) bsobj=BeautifulSoup(html, 'lxml') for link in bsobj.findAll("a", href=re.compile("^(/heppa/)")): if 'href' in link.attrs: if link.attrs['href'] not in pages: newPage=link.attrs['href'] print(newPage) pages.add(newPage) getLinks(newPage) getLinks("") I'm new in Python and web scraping. I found this code somewhere. Trying modify code so I can save links to file, but I cant. Please help me. Thanks in advance. RE: Saving links as text - Larz60+ - Jan-05-2020 This code may have worked in the past, (and still may) saving should be simple, but... the webpage is almost entirely javaScript, so to properly scrape you should use selenium. there are two tutorials you on this site you should run through (doesn't take long): web scraping part1 web scraping part2 |