Python Forum
How do I avoid Beautiful Soup redirects?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How do I avoid Beautiful Soup redirects?
#8
The redirect is not happening anymore for me when I paste the url into my searchbar. However, BeautifulSoup is still being redirected as the results I get from it do not all match the selected criteria of 'NYSE'. Let me show you what I mean.

This is my input code:
>>> import bs4 as bs
>>> import urllib.request
>>> sauce = urllib.request.urlopen('http://globenewswire.com/Search/NewsSearch?exchange=NYSE').read()\
>>> soup = bs.BeautifulSoup(sauce,'lxml')
>>> list = []\>>> for div in soup.find_all('div', class_='results-link', limit=10):
	initialglobenewsnasdaqurls = ('https://globenewswire.com' + div.h1.a['href'])
	list.append(initialglobenewsnasdaqurls)
>>> a, b, c, d, e, f, g, h, i, j = list
>>> while True:
	saucea = urllib.request.urlopen(a).read()
	soupa = bs.BeautifulSoup(saucea,'lxml')
	sauceb = urllib.request.urlopen(b).read()
	soupb = bs.BeautifulSoup(sauceb,'lxml')
	saucec = urllib.request.urlopen(c).read()
	soupc = bs.BeautifulSoup(saucec,'lxml')
	sauced = urllib.request.urlopen(d).read()
	soupd = bs.BeautifulSoup(sauced,'lxml')
	saucee = urllib.request.urlopen(e).read()
	soupe = bs.BeautifulSoup(saucee,'lxml')
	saucef = urllib.request.urlopen(f).read()
	soupf = bs.BeautifulSoup(saucef,'lxml')
	sauceg = urllib.request.urlopen(g).read()
	soupg = bs.BeautifulSoup(sauceg,'lxml')
	sauceh = urllib.request.urlopen(h).read()
	souph = bs.BeautifulSoup(sauceh,'lxml')
	saucei = urllib.request.urlopen(i).read()
	soupi = bs.BeautifulSoup(saucei,'lxml')
	saucej = urllib.request.urlopen(j).read()
	soupj = bs.BeautifulSoup(saucej,'lxml')
	desca = soupa.find_all(attrs={"name":"ticker"}, limit=1)
	tickeraraw = (desca[0]['content'].encode('utf-8'))
	decodedtickera = tickeraraw.decode('utf')
	soupatitle = soupa.title.text
	descb = soupb.find_all(attrs={"name":"ticker"}, limit=1)
	tickerbraw = (descb[0]['content'].encode('utf-8'))
	decodedtickerb = tickerbraw.decode('utf')
	soupbtitle = soupb.title.text
	descc = soupc.find_all(attrs={"name":"ticker"}, limit=1)
	tickercraw = (descc[0]['content'].encode('utf-8'))
	decodedtickerc = tickercraw.decode('utf')
	soupctitle = soupc.title.text
	descd = soupd.find_all(attrs={"name":"ticker"}, limit=1)
	tickerdraw = (descd[0]['content'].encode('utf-8'))
	decodedtickerd = tickerdraw.decode('utf')
	soupdtitle = soupd.title.text
	desce = soupe.find_all(attrs={"name":"ticker"}, limit=1)
	tickereraw = (desce[0]['content'].encode('utf-8'))
	decodedtickere = tickereraw.decode('utf')
	soupetitle = soupe.title.text
	descf = soupf.find_all(attrs={"name":"ticker"}, limit=1)
	tickerfraw = (descf[0]['content'].encode('utf-8'))
	decodedtickerf = tickerfraw.decode('utf')
	soupftitle = soupf.title.text
	descg = soupg.find_all(attrs={"name":"ticker"}, limit=1)
	tickergraw = (descg[0]['content'].encode('utf-8'))
	decodedtickerg = tickergraw.decode('utf')
	soupgtitle = soupg.title.text
	desch = souph.find_all(attrs={"name":"ticker"}, limit=1)
	tickerhraw = (desch[0]['content'].encode('utf-8'))
	decodedtickerh = tickerhraw.decode('utf')
	souphtitle = souph.title.text
	desci = soupi.find_all(attrs={"name":"ticker"}, limit=1) 
	tickeriraw = (desci[0]['content'].encode('utf-8'))
	decodedtickeri = tickeriraw.decode('utf')
	soupititle = soupi.title.text
	descj = soupj.find_all(attrs={"name":"ticker"}, limit=1)
	tickerjraw = (descj[0]['content'].encode('utf-8'))
	decodedtickerj = tickerjraw.decode('utf')
	soupjtitle = soupj.title.text
	break
Then I go to my results of what I parsed. I now print the stock ticker which also prints the stock exchange. They should all be listed on the NYSE because that is my search criteria, however these are my results:
>>> print(decodedtickera)
NYSE:PGH, TSX:PGF
>>> print(decodedtickerb)
TSX-V:TIC
>>> print(decodedtickerc)

>>> print(decodedtickerd)

>>> print(decodedtickere)
NYSE:BSCI, NYSE:BSCJ, NYSE:BSCK, NYSE:BSCH, NYSE:GSY, NYSE:BSCL, NYSE:BSCM, NYSE:BSCN, NYSE:BSCO, NYSE:BSCP, NYSE:GTO, NYSE:BSCQ

>>> print(decodedtickerf)

>>> print(decodedtickerg)
Nasdaq:BMTC, Nasdaq:RBPAA
>>> print(decodedtickerh)
Nasdaq:VBTX
>>> print(decodedtickeri)
TSX:XAU, TSX-V:AGX-H.V
>>> print(decodedtickerj)
I know that BeautifulSoup is being redirected from the url with the search criteria (http://globenewswire.com/Search/NewsSear...hange=NYSE) to the main page (http://globenewswire.com/NewsRoom). I know this because not all of my search critera has the 'NYSE' with the ticker. This search is getting some stocks from the Nasdaq exchange and some more stocks from the TSX and TSX-V exchange. The redirect stopped happening in my browser, however BeautifulSoup is still being redirected.
Reply


Messages In This Thread
RE: How do I avoid Beautiful Soup redirects? - by HiImNew - Dec-01-2017, 11:34 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Soup('A') new_coder_231013 6 3,076 Aug-12-2023, 10:55 AM
Last Post: Pubfonts
  Beautiful Soup - access a rating value in a class KatMac 1 3,608 Apr-16-2021, 01:27 PM
Last Post: snippsat
  *Beginner* web scraping/Beautiful Soup help 7ken8 2 2,783 Jan-28-2021, 04:26 PM
Last Post: 7ken8
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 2,867 Oct-01-2020, 02:19 PM
Last Post: snippsat
  Beautiful Soup (suddenly) doesn't get full webpage html j.crater 8 18,110 Jul-11-2020, 04:31 PM
Last Post: j.crater
  Requests-HTML vs Beautiful Soup - How to Choose? robin73 0 3,922 Jun-23-2020, 02:53 PM
Last Post: robin73
  looking for direction - scrappy, crawler, beautiful soup Sly_Corn 2 2,611 Mar-17-2020, 03:17 PM
Last Post: Sly_Corn
  Beautiful soup truncates results jonesjoz 4 4,182 Mar-09-2020, 06:04 PM
Last Post: jonesjoz
  Beautiful soup and tags starter_student 11 6,695 Jul-08-2019, 03:41 PM
Last Post: starter_student
  Beautiful Soup find_all() kirito85 2 3,550 Jun-14-2019, 02:17 AM
Last Post: kirito85

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020