Need help with Web Parsing and Lists

HiImNew · Nov-18-2017, 10:16 PM

This is my code:

>>> import bs4 as bs
>>> import urllib.request
>>> sauce = urllib.request.urlopen('https://globenewswire.com/Search/NewsSearch?lang=en&exchange=Nasdaq').read()
>>> soup = bs.BeautifulSoup(sauce,'lxml')
>>> for div in soup.find_all('div', class_='results-link'):
    str = ('https://globenewswire.com' + div.h1.a['href'])
	List = str.splitlines()
	print(List)

['https://globenewswire.com/news-release/2017/11/18/1197160/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html']
['https://globenewswire.com/news-release/2017/11/18/1197159/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html']
['https://globenewswire.com/news-release/2017/11/18/1197158/0/en/IT-INET-Nordic-Production-Successfully-upgraded-to-the-November-20-release-82-17.html']
['https://globenewswire.com/news-release/2017/11/18/1197157/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html']
['https://globenewswire.com/news-release/2017/11/18/1195106/0/en/Aerojet-Rocketdyne-Supports-ULA-Delta-II-Launch-of-Joint-Polar-Satellite-System-1.html']
['https://globenewswire.com/news-release/2017/11/18/1195105/0/en/Voting-for-Stars-of-Science-Season-9-Finale-Opens.html']
['https://globenewswire.com/news-release/2017/11/18/1195104/0/en/SHAREHOLDER-ALERT-Pomerantz-Law-Firm-Reminds-Shareholders-with-Losses-on-their-Investment-in-Intercept-Pharmaceuticals-Inc-of-Class-Action-Lawsuit-and-Upcoming-Deadline-ICPT.html']
['https://globenewswire.com/news-release/2017/11/18/1195103/0/is/Hampi%C3%B0jan-l%C3%BDkur-vi%C3%B0-kaup-%C3%A1-Voot-Beitu.html']
['https://globenewswire.com/news-release/2017/11/18/1195102/0/en/Best-Fitbit-Black-Friday-Cyber-Monday-Deals-of-2017-Compared-by-Deal-Tomato.html']
['https://globenewswire.com/news-release/2017/11/18/1195101/0/en/The-Best-Canon-DSLR-Camera-Black-Friday-2017-Deals-Topic-Reviews-Publish-Round-Up-of-Top-Deals.html']

I've tried doing

 str.splitlines()

however, that just gives me 10 separate lists each with one url in them. How do I put all 10 of these urls into a single list. There would basically be only two brackets and nine commas, so the list would look like this:

['https://globenewswire.com/news-release/2017/11/18/1197160/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html', 'https://globenewswire.com/news-release/2017/11/18/1197159/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html',
'https://globenewswire.com/news-release/2017/11/18/1197158/0/en/IT-INET-Nordic-Production-Successfully-upgraded-to-the-November-20-release-82-17.html',
'https://globenewswire.com/news-release/2017/11/18/1197157/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html',
'https://globenewswire.com/news-release/2017/11/18/1195106/0/en/Aerojet-Rocketdyne-Supports-ULA-Delta-II-Launch-of-Joint-Polar-Satellite-System-1.html',
'https://globenewswire.com/news-release/2017/11/18/1195105/0/en/Voting-for-Stars-of-Science-Season-9-Finale-Opens.html',
'https://globenewswire.com/news-release/2017/11/18/1195104/0/en/SHAREHOLDER-ALERT-Pomerantz-Law-Firm-Reminds-Shareholders-with-Losses-on-their-Investment-in-Intercept-Pharmaceuticals-Inc-of-Class-Action-Lawsuit-and-Upcoming-Deadline-ICPT.html',
'https://globenewswire.com/news-release/2017/11/18/1195103/0/is/Hampi%C3%B0jan-l%C3%BDkur-vi%C3%B0-kaup-%C3%A1-Voot-Beitu.html',
'https://globenewswire.com/news-release/2017/11/18/1195102/0/en/Best-Fitbit-Black-Friday-Cyber-Monday-Deals-of-2017-Compared-by-Deal-Tomato.html',
'https://globenewswire.com/news-release/2017/11/18/1195101/0/en/The-Best-Canon-DSLR-Camera-Black-Friday-2017-Deals-Topic-Reviews-Publish-Round-Up-of-Top-Deals.html']

I'm trying to get each url into one list so I can assign a variable to each individual url of the list:

a, b, c, d, e, f, g, h, i, j = List

Any help is appreciated. Thank you. Tongue

***metulburr*** · Nov-18-2017, 10:46 PM

you can just append the url to a list on each iteration

import bs4 as bs
import urllib.request
sauce = urllib.request.urlopen('https://globenewswire.com/Search/NewsSearch?lang=en&exchange=Nasdaq').read()
soup = bs.BeautifulSoup(sauce,'lxml')
lst = []
for div in soup.find_all('div', class_='results-link'):
    url = 'https://globenewswire.com{}'.format(div.h1.a['href'])
    lst.append(url)
    
print(lst)

Output:
['https://globenewswire.com/news-release/2017/11/18/1197161/0/en/Veritas-Pharma-Enters-Binding-Letter-of-Intent-to-Secure-ACMPR-License-and-Cannabis-Growing-Facility.html', 'https://globenewswire.com/news-release/2017/11/18/1197160/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html', 'https://globenewswire.com/news-release/2017/11/18/1197159/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html', 'https://globenewswire.com/news-release/2017/11/18/1197158/0/en/IT-INET-Nordic-Production-Successfully-upgraded-to-the-November-20-release-82-17.html', 'https://globenewswire.com/news-release/2017/11/18/1197157/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html', 'https://globenewswire.com/news-release/2017/11/18/1195106/0/en/Aerojet-Rocketdyne-Supports-ULA-Delta-II-Launch-of-Joint-Polar-Satellite-System-1.html', 'https://globenewswire.com/news-release/2017/11/18/1195105/0/en/Voting-for-Stars-of-Science-Season-9-Finale-Opens.html', 'https://globenewswire.com/news-release/2017/11/18/1195104/0/en/SHAREHOLDER-ALERT-Pomerantz-Law-Firm-Reminds-Shareholders-with-Losses-on-their-Investment-in-Intercept-Pharmaceuticals-Inc-of-Class-Action-Lawsuit-and-Upcoming-Deadline-ICPT.html', 'https://globenewswire.com/news-release/2017/11/18/1195103/0/is/Hampi%C3%B0jan-l%C3%BDkur-vi%C3%B0-kaup-%C3%A1-Voot-Beitu.html', 'https://globenewswire.com/news-release/2017/11/18/1195102/0/en/Best-Fitbit-Black-Friday-Cyber-Monday-Deals-of-2017-Compared-by-Deal-Tomato.html']

if you want pretty print

import bs4 as bs
import urllib.request
import pprint
sauce = urllib.request.urlopen('https://globenewswire.com/Search/NewsSearch?lang=en&exchange=Nasdaq').read()
soup = bs.BeautifulSoup(sauce,'lxml')
lst = []
for div in soup.find_all('div', class_='results-link'):
    url = 'https://globenewswire.com{}'.format(div.h1.a['href'])
    lst.append(url)
    
pprint.pprint(lst)

Output:['https://globenewswire.com/news-release/2017/11/18/1197161/0/en/Veritas-Pharma-Enters-Binding-Letter-of-Intent-to-Secure-ACMPR-License-and-Cannabis-Growing-Facility.html',
 'https://globenewswire.com/news-release/2017/11/18/1197160/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html',
 'https://globenewswire.com/news-release/2017/11/18/1197159/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html',
 'https://globenewswire.com/news-release/2017/11/18/1197158/0/en/IT-INET-Nordic-Production-Successfully-upgraded-to-the-November-20-release-82-17.html',
 'https://globenewswire.com/news-release/2017/11/18/1197157/0/en/IT-Genium-INET-Successfully-Upgraded-to-5-0-0201.html',
 'https://globenewswire.com/news-release/2017/11/18/1195106/0/en/Aerojet-Rocketdyne-Supports-ULA-Delta-II-Launch-of-Joint-Polar-Satellite-System-1.html',
 'https://globenewswire.com/news-release/2017/11/18/1195105/0/en/Voting-for-Stars-of-Science-Season-9-Finale-Opens.html',
 'https://globenewswire.com/news-release/2017/11/18/1195104/0/en/SHAREHOLDER-ALERT-Pomerantz-Law-Firm-Reminds-Shareholders-with-Losses-on-their-Investment-in-Intercept-Pharmaceuticals-Inc-of-Class-Action-Lawsuit-and-Upcoming-Deadline-ICPT.html',
 'https://globenewswire.com/news-release/2017/11/18/1195103/0/is/Hampi%C3%B0jan-l%C3%BDkur-vi%C3%B0-kaup-%C3%A1-Voot-Beitu.html',
 'https://globenewswire.com/news-release/2017/11/18/1195102/0/en/Best-Fitbit-Black-Friday-Cyber-Monday-Deals-of-2017-Compared-by-Deal-Tomato.html']

HiImNew · (This post was last modified: Nov-18-2017, 11:19 PM by HiImNew.)

Thank you for the help again Tongue

Need help with Web Parsing and Lists

User Panel Messages

Announcements