Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Saving links as text
#1
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

pages=set()
def getLinks(pageUrl):
    global pages
    html=urlopen("https://heppa.hippos.fi"+pageUrl)
    bsobj=BeautifulSoup(html, 'lxml')
    for link in bsobj.findAll("a", href=re.compile("^(/heppa/)")):
        if 'href' in link.attrs:
            if link.attrs['href'] not in pages:
                newPage=link.attrs['href']
                print(newPage)
                pages.add(newPage)
                getLinks(newPage)
getLinks("")


I'm new in Python and web scraping. I found this code somewhere. Trying modify code so I can save links to file, but I cant.

Please help me.
Thanks in advance.
Sad
Reply


Messages In This Thread
Saving links as text - by hessu - Jan-04-2020, 06:19 PM
RE: Saving links as text - by Larz60+ - Jan-05-2020, 09:29 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  webscrapping links and then enter those links to scrape data kirito85 2 3,202 Jun-13-2019, 02:23 AM
Last Post: kirito85

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020