How do I avoid Beautiful Soup redirects?

HiImNew · (This post was last modified: Dec-01-2017, 11:35 PM by HiImNew.)

The redirect is not happening anymore for me when I paste the url into my searchbar. However, BeautifulSoup is still being redirected as the results I get from it do not all match the selected criteria of 'NYSE'. Let me show you what I mean.

This is my input code:

>>> import bs4 as bs
>>> import urllib.request
>>> sauce = urllib.request.urlopen('http://globenewswire.com/Search/NewsSearch?exchange=NYSE').read()\
>>> soup = bs.BeautifulSoup(sauce,'lxml')
>>> list = []\>>> for div in soup.find_all('div', class_='results-link', limit=10):
	initialglobenewsnasdaqurls = ('https://globenewswire.com' + div.h1.a['href'])
	list.append(initialglobenewsnasdaqurls)
>>> a, b, c, d, e, f, g, h, i, j = list
>>> while True:
	saucea = urllib.request.urlopen(a).read()
	soupa = bs.BeautifulSoup(saucea,'lxml')
	sauceb = urllib.request.urlopen(b).read()
	soupb = bs.BeautifulSoup(sauceb,'lxml')
	saucec = urllib.request.urlopen(c).read()
	soupc = bs.BeautifulSoup(saucec,'lxml')
	sauced = urllib.request.urlopen(d).read()
	soupd = bs.BeautifulSoup(sauced,'lxml')
	saucee = urllib.request.urlopen(e).read()
	soupe = bs.BeautifulSoup(saucee,'lxml')
	saucef = urllib.request.urlopen(f).read()
	soupf = bs.BeautifulSoup(saucef,'lxml')
	sauceg = urllib.request.urlopen(g).read()
	soupg = bs.BeautifulSoup(sauceg,'lxml')
	sauceh = urllib.request.urlopen(h).read()
	souph = bs.BeautifulSoup(sauceh,'lxml')
	saucei = urllib.request.urlopen(i).read()
	soupi = bs.BeautifulSoup(saucei,'lxml')
	saucej = urllib.request.urlopen(j).read()
	soupj = bs.BeautifulSoup(saucej,'lxml')
	desca = soupa.find_all(attrs={"name":"ticker"}, limit=1)
	tickeraraw = (desca[0]['content'].encode('utf-8'))
	decodedtickera = tickeraraw.decode('utf')
	soupatitle = soupa.title.text
	descb = soupb.find_all(attrs={"name":"ticker"}, limit=1)
	tickerbraw = (descb[0]['content'].encode('utf-8'))
	decodedtickerb = tickerbraw.decode('utf')
	soupbtitle = soupb.title.text
	descc = soupc.find_all(attrs={"name":"ticker"}, limit=1)
	tickercraw = (descc[0]['content'].encode('utf-8'))
	decodedtickerc = tickercraw.decode('utf')
	soupctitle = soupc.title.text
	descd = soupd.find_all(attrs={"name":"ticker"}, limit=1)
	tickerdraw = (descd[0]['content'].encode('utf-8'))
	decodedtickerd = tickerdraw.decode('utf')
	soupdtitle = soupd.title.text
	desce = soupe.find_all(attrs={"name":"ticker"}, limit=1)
	tickereraw = (desce[0]['content'].encode('utf-8'))
	decodedtickere = tickereraw.decode('utf')
	soupetitle = soupe.title.text
	descf = soupf.find_all(attrs={"name":"ticker"}, limit=1)
	tickerfraw = (descf[0]['content'].encode('utf-8'))
	decodedtickerf = tickerfraw.decode('utf')
	soupftitle = soupf.title.text
	descg = soupg.find_all(attrs={"name":"ticker"}, limit=1)
	tickergraw = (descg[0]['content'].encode('utf-8'))
	decodedtickerg = tickergraw.decode('utf')
	soupgtitle = soupg.title.text
	desch = souph.find_all(attrs={"name":"ticker"}, limit=1)
	tickerhraw = (desch[0]['content'].encode('utf-8'))
	decodedtickerh = tickerhraw.decode('utf')
	souphtitle = souph.title.text
	desci = soupi.find_all(attrs={"name":"ticker"}, limit=1) 
	tickeriraw = (desci[0]['content'].encode('utf-8'))
	decodedtickeri = tickeriraw.decode('utf')
	soupititle = soupi.title.text
	descj = soupj.find_all(attrs={"name":"ticker"}, limit=1)
	tickerjraw = (descj[0]['content'].encode('utf-8'))
	decodedtickerj = tickerjraw.decode('utf')
	soupjtitle = soupj.title.text
	break

Then I go to my results of what I parsed. I now print the stock ticker which also prints the stock exchange. They should all be listed on the NYSE because that is my search criteria, however these are my results:

>>> print(decodedtickera)
NYSE:PGH, TSX:PGF
>>> print(decodedtickerb)
TSX-V:TIC
>>> print(decodedtickerc)

>>> print(decodedtickerd)

>>> print(decodedtickere)
NYSE:BSCI, NYSE:BSCJ, NYSE:BSCK, NYSE:BSCH, NYSE:GSY, NYSE:BSCL, NYSE:BSCM, NYSE:BSCN, NYSE:BSCO, NYSE:BSCP, NYSE:GTO, NYSE:BSCQ

>>> print(decodedtickerf)

>>> print(decodedtickerg)
Nasdaq:BMTC, Nasdaq:RBPAA
>>> print(decodedtickerh)
Nasdaq:VBTX
>>> print(decodedtickeri)
TSX:XAU, TSX-V:AGX-H.V
>>> print(decodedtickerj)

I know that BeautifulSoup is being redirected from the url with the search criteria (http://globenewswire.com/Search/NewsSear...hange=NYSE) to the main page (http://globenewswire.com/NewsRoom). I know this because not all of my search critera has the 'NYSE' with the ticker. This search is getting some stocks from the Nasdaq exchange and some more stocks from the TSX and TSX-V exchange. The redirect stopped happening in my browser, however BeautifulSoup is still being redirected.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Soup('A')	new_coder_231013	6	3,076	Aug-12-2023, 10:55 AM Last Post: Pubfonts
	Beautiful Soup - access a rating value in a class	KatMac	1	3,608	Apr-16-2021, 01:27 PM Last Post: snippsat
	Beginner web scraping/Beautiful Soup help	7ken8	2	2,783	Jan-28-2021, 04:26 PM Last Post: 7ken8
	Help: Beautiful Soup - Parsing HTML table	ironfelix717	2	2,867	Oct-01-2020, 02:19 PM Last Post: snippsat
	Beautiful Soup (suddenly) doesn't get full webpage html	j.crater	8	18,110	Jul-11-2020, 04:31 PM Last Post: j.crater
	Requests-HTML vs Beautiful Soup - How to Choose?	robin73	0	3,922	Jun-23-2020, 02:53 PM Last Post: robin73
	looking for direction - scrappy, crawler, beautiful soup	Sly_Corn	2	2,611	Mar-17-2020, 03:17 PM Last Post: Sly_Corn
	Beautiful soup truncates results	jonesjoz	4	4,182	Mar-09-2020, 06:04 PM Last Post: jonesjoz
	Beautiful soup and tags	starter_student	11	6,695	Jul-08-2019, 03:41 PM Last Post: starter_student
	Beautiful Soup find_all()	kirito85	2	3,550	Jun-14-2019, 02:17 AM Last Post: kirito85

How do I avoid Beautiful Soup redirects?

User Panel Messages

Announcements