Feb-24-2018, 05:12 PM
Larz60+ code work for me,not tested it with other links.
Start simple here a the basic setup.
Then can error handling/testing or not(as many drop in web-scraping).
This filter out so only get link that has
I have a tutorial here, part-2
Start simple here a the basic setup.
Then can error handling/testing or not(as many drop in web-scraping).
from bs4 import BeautifulSoup import requests url = 'https://www.python.org/' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'lxml') for link in soup.select('a'): if link.get('href').startswith('http'): print(link.get('href'))
Output:https://docs.python.org
https://pypi.python.org/
http://plus.google.com/+Python
http://www.facebook.com/pythonlang?fref=ts
http://twitter.com/ThePSF
http://brochure.getpython.info/
.... ect
So this get links bye using CSS selector or could have used soup.find_all('a')
This filter out so only get link that has
http
.I have a tutorial here, part-2