Feb-15-2018, 06:21 AM
I have tried freaking everything and for some reason google searches aren't providing much.
This is code to parse multiple pages of the same url, but for some reason I keep on getting this error.
Here is a fragment of the code that keeps giving the error. It is a part of a larger code, but I, or rather Python, has narrowed the error down to this segment:
#EDIT: Forgot imports in code
#ANOTHER EDIT:
##FINAL EDIT:
Nvm I got it. Turns out I had overwritten 'str' as a variable. I got rid of that and imported 'builtins' just to be safe.
This is code to parse multiple pages of the same url, but for some reason I keep on getting this error.
Here is a fragment of the code that keeps giving the error. It is a part of a larger code, but I, or rather Python, has narrowed the error down to this segment:
import bs4 as bs import urllib.request while True: list = [] listalmostfinal = [] listfinal = [] part_1 = 'https://www.businesswire.com/portal/site/home/template.PAGE/news/?javax.portlet.tpst=ccf123a93466ea4c882a06a9149550fd&javax.portlet.prp_ccf123a93466ea4c882a06a9149550fd_viewID=MY_PORTAL_VIEW&javax.portlet.prp_ccf123a93466ea4c882a06a9149550fd_ndmHsc=v2*A1515934800000*B1518565773469*DgroupByDate*G' part_2 = '*N1000003&javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken' page_counter = 1 pi = 0 url = ';-;' while True: page_flip = 'yes' for i in list: for j in worksheet_list: if i == j: page_flip = 'no' if page_flip == 'no': for i in list: if i not in listalmostfinal: listalmostfinal.append(i) if len(listalmostfinal) == 0: print('Error New Part') break else: page_counter += 1 url = (part_1 + str(page_counter) + part_2) while True: try: sauce = urllib.request.urlopen(url).read() break except: time.sleep(30) pi += 1 if pi >= 5: print('BW Search Access Failure') print(pi) soup = bs.BeautifulSoup(sauce, 'lxml') for a in soup.find_all('a', class_='bwTitleLink', limit=25): initialbusinesswireurls = ('https://www.businesswire.com' + a['href']) list.append(initialbusinesswireurls) while True: if page_counter > 5: print('Flipping Pages to Page ' + str(page_counter)) breakHere is the output:
Flipping Pages to Page 6 Flipping Pages to Page 7 Flipping Pages to Page 8 Flipping Pages to Page 9 Flipping Pages to Page 10 Traceback (most recent call last): File "<pyshell#3>", line 74, in <module> url = (part_1 + str(page_counter) + part_2) TypeError: must be str, not ResultSet >>>I'm sure someone experienced with parsing knows what this is but I got no clue. The "url = ';-;'" was my attempt to reset the variable that I thought was causing the problem, since the code works the first time through, just not the second or third. If anyone knows what this is and or how to fix it, please help. Thank you!
#EDIT: Forgot imports in code
#ANOTHER EDIT:
>>> str(page_counter) []This should output a number, but for some reason it is just giving brackets. Is this a result set?
##FINAL EDIT:
Nvm I got it. Turns out I had overwritten 'str' as a variable. I got rid of that and imported 'builtins' just to be safe.