Python Forum
Need help with Web Parsing and Lists
Thread Rating:
  • 2 Vote(s) - 4.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need help with Web Parsing and Lists
#2
you can just append the url to a list on each iteration

import bs4 as bs
import urllib.request
sauce = urllib.request.urlopen('https://globenewswire.com/Search/NewsSearch?lang=en&exchange=Nasdaq').read()
soup = bs.BeautifulSoup(sauce,'lxml')
lst = []
for div in soup.find_all('div', class_='results-link'):
    url = 'https://globenewswire.com{}'.format(div.h1.a['href'])
    lst.append(url)
    
print(lst)
Output:
['https://globenewswire.com/news-release/2017/11/18/1197161/0/en/Veritas-Pharma-Enters-Binding-Letter-of-Intent-to-Secure-ACMPR-License-and-Cannabis-Growing-Facility.html', 'https://globenewswire.com/news-release/2017/11/18/1197160/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html', 'https://globenewswire.com/news-release/2017/11/18/1197159/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html', 'https://globenewswire.com/news-release/2017/11/18/1197158/0/en/IT-INET-Nordic-Production-Successfully-upgraded-to-the-November-20-release-82-17.html', 'https://globenewswire.com/news-release/2017/11/18/1197157/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html', 'https://globenewswire.com/news-release/2017/11/18/1195106/0/en/Aerojet-Rocketdyne-Supports-ULA-Delta-II-Launch-of-Joint-Polar-Satellite-System-1.html', 'https://globenewswire.com/news-release/2017/11/18/1195105/0/en/Voting-for-Stars-of-Science-Season-9-Finale-Opens.html', 'https://globenewswire.com/news-release/2017/11/18/1195104/0/en/SHAREHOLDER-ALERT-Pomerantz-Law-Firm-Reminds-Shareholders-with-Losses-on-their-Investment-in-Intercept-Pharmaceuticals-Inc-of-Class-Action-Lawsuit-and-Upcoming-Deadline-ICPT.html', 'https://globenewswire.com/news-release/2017/11/18/1195103/0/is/Hampi%C3%B0jan-l%C3%BDkur-vi%C3%B0-kaup-%C3%A1-Voot-Beitu.html', 'https://globenewswire.com/news-release/2017/11/18/1195102/0/en/Best-Fitbit-Black-Friday-Cyber-Monday-Deals-of-2017-Compared-by-Deal-Tomato.html']
if you want pretty print
import bs4 as bs
import urllib.request
import pprint
sauce = urllib.request.urlopen('https://globenewswire.com/Search/NewsSearch?lang=en&exchange=Nasdaq').read()
soup = bs.BeautifulSoup(sauce,'lxml')
lst = []
for div in soup.find_all('div', class_='results-link'):
    url = 'https://globenewswire.com{}'.format(div.h1.a['href'])
    lst.append(url)
    
pprint.pprint(lst)
Output:
['https://globenewswire.com/news-release/2017/11/18/1197161/0/en/Veritas-Pharma-Enters-Binding-Letter-of-Intent-to-Secure-ACMPR-License-and-Cannabis-Growing-Facility.html', 'https://globenewswire.com/news-release/2017/11/18/1197160/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html', 'https://globenewswire.com/news-release/2017/11/18/1197159/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html', 'https://globenewswire.com/news-release/2017/11/18/1197158/0/en/IT-INET-Nordic-Production-Successfully-upgraded-to-the-November-20-release-82-17.html', 'https://globenewswire.com/news-release/2017/11/18/1197157/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html', 'https://globenewswire.com/news-release/2017/11/18/1195106/0/en/Aerojet-Rocketdyne-Supports-ULA-Delta-II-Launch-of-Joint-Polar-Satellite-System-1.html', 'https://globenewswire.com/news-release/2017/11/18/1195105/0/en/Voting-for-Stars-of-Science-Season-9-Finale-Opens.html', 'https://globenewswire.com/news-release/2017/11/18/1195104/0/en/SHAREHOLDER-ALERT-Pomerantz-Law-Firm-Reminds-Shareholders-with-Losses-on-their-Investment-in-Intercept-Pharmaceuticals-Inc-of-Class-Action-Lawsuit-and-Upcoming-Deadline-ICPT.html', 'https://globenewswire.com/news-release/2017/11/18/1195103/0/is/Hampi%C3%B0jan-l%C3%BDkur-vi%C3%B0-kaup-%C3%A1-Voot-Beitu.html', 'https://globenewswire.com/news-release/2017/11/18/1195102/0/en/Best-Fitbit-Black-Friday-Cyber-Monday-Deals-of-2017-Compared-by-Deal-Tomato.html']
Recommended Tutorials:
Reply


Messages In This Thread
Need help with Web Parsing and Lists - by HiImNew - Nov-18-2017, 10:16 PM
RE: Need help with Web Parsing and Lists - by metulburr - Nov-18-2017, 10:46 PM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020