Python Forum

I have tried freaking everything and for some reason google searches aren't providing much.

This is code to parse multiple pages of the same url, but for some reason I keep on getting this error.

Here is a fragment of the code that keeps giving the error. It is a part of a larger code, but I, or rather Python, has narrowed the error down to this segment:

import bs4 as bs
import urllib.request
while True:
	list = []
	listalmostfinal = []
	listfinal = []
	part_1 = 'https://www.businesswire.com/portal/site/home/template.PAGE/news/?javax.portlet.tpst=ccf123a93466ea4c882a06a9149550fd&javax.portlet.prp_ccf123a93466ea4c882a06a9149550fd_viewID=MY_PORTAL_VIEW&javax.portlet.prp_ccf123a93466ea4c882a06a9149550fd_ndmHsc=v2*A1515934800000*B1518565773469*DgroupByDate*G'
	part_2 = '*N1000003&javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken'
	page_counter = 1
	pi = 0
	url = ';-;'
	while True:
		page_flip = 'yes'
		for i in list:
			for j in worksheet_list:
				if i == j:
					page_flip = 'no'
		if page_flip == 'no':
			for i in list:
				if i not in listalmostfinal:
					listalmostfinal.append(i)
			if len(listalmostfinal) == 0:
				print('Error New Part')
			break
		else:
			page_counter += 1
			url = (part_1 + str(page_counter) + part_2)
			while True:
				try:
					sauce = urllib.request.urlopen(url).read()
					break
				except:
					time.sleep(30)
					pi += 1
					if pi >= 5:
						print('BW Search Access Failure')
						print(pi)
			soup = bs.BeautifulSoup(sauce, 'lxml')
			for a in soup.find_all('a', class_='bwTitleLink', limit=25):
				initialbusinesswireurls = ('https://www.businesswire.com' + a['href'])
				list.append(initialbusinesswireurls)
			while True:
				if page_counter > 5:
					print('Flipping Pages to Page ' + str(page_counter))
				break

Here is the output:

Flipping Pages to Page 6
Flipping Pages to Page 7
Flipping Pages to Page 8
Flipping Pages to Page 9
Flipping Pages to Page 10
Traceback (most recent call last):
  File "<pyshell#3>", line 74, in <module>
    url = (part_1 + str(page_counter) + part_2)
TypeError: must be str, not ResultSet
>>>

I'm sure someone experienced with parsing knows what this is but I got no clue. The "url = ';-;'" was my attempt to reset the variable that I thought was causing the problem, since the code works the first time through, just not the second or third. If anyone knows what this is and or how to fix it, please help. Thank you!
#EDIT: Forgot imports in code
#ANOTHER EDIT:

>>> str(page_counter)
[]

This should output a number, but for some reason it is just giving brackets. Is this a result set?
##FINAL EDIT:
Nvm I got it. Turns out I had overwritten 'str' as a variable. I got rid of that and imported 'builtins' just to be safe.

Use this url: https://www.businesswire.com/portal/site...PAGE/news/

It would really be helpful if you could show the error verbatim.
They contain some very useful information.

(Feb-15-2018, 06:35 AM)Larz60+ Wrote: [ -> ]Use this url: https://www.businesswire.com/portal/site...PAGE/news/

It would really be helpful if you could show the error verbatim.
They contain some very useful information.

Nvm I got it. Turns out I had overwritten 'str' as a variable. I got rid of that and imported 'builtins' just to be safe.

Thank you for the fast reply though, and have a nice night. Tongue

(Feb-15-2018, 06:44 AM)HiImNew Wrote: [ -> ]Turns out I had overwritten 'str' as a variable.

you do the same with list()

HiImNew

Larz60+

HiImNew

buran