Saving links as text

hessu · Jan-04-2020, 06:19 PM

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

pages=set()
def getLinks(pageUrl):
    global pages
    html=urlopen("https://heppa.hippos.fi"+pageUrl)
    bsobj=BeautifulSoup(html, 'lxml')
    for link in bsobj.findAll("a", href=re.compile("^(/heppa/)")):
        if 'href' in link.attrs:
            if link.attrs['href'] not in pages:
                newPage=link.attrs['href']
                print(newPage)
                pages.add(newPage)
                getLinks(newPage)
getLinks("")

I'm new in Python and web scraping. I found this code somewhere. Trying modify code so I can save links to file, but I cant.

Please help me.
Thanks in advance.
Sad

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	webscrapping links and then enter those links to scrape data	kirito85	2	3,202	Jun-13-2019, 02:23 AM Last Post: kirito85

Saving links as text

User Panel Messages

Announcements