python beginner HTTP Error 500

leofcastro · Jan-24-2020, 04:37 PM

Dear all, this is my first time sharing doubts in a forum. I would be very grateful if you could help me.

I have succeeded in scrapping the data of a website using a certain code. Then I wrote some code to reproduce the process multiple times. The error message error 500 makes me believe that the code could be right and that the server could simply not allow me to scrap as much I want to. I do not know much about programming. Thus, I wonder if there is a simple solution or if I should simply learn way more about coding before being able to solve such an issue. The code is the following:

# Import for the first part of the code
import codecs
import re
import unidecode
# Import for the second part of the code
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
import re
import pandas as pd

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

# Part 1: Import a file with the names of all cities in goiás and make changes according to the url of the IBGE website
lst_cidades=list()
handle=codecs.open('cidades_goias_pequeno.txt','r','utf8')
for line in handle:
	line=line.rstrip()
	line=line.lower()
	line= re.sub(" ", "-", line)
	cidade = unidecode.unidecode(line)
	lst_cidades.append(cidade)
#print(lst_cidades)
# Part 2: I will now try to create all the possible url that display the pages where I can find the number of cattle in each city from 2004 to 2008
lst3=list()
for item in lst_cidades:
	link_skeleton='https://cidades.ibge.gov.br/brasil/go/'+ str(item) + '/pesquisa/18/16459?ano='
	#print('Links para a cidade de', item,':')
	#print(link_skeleton)
	list_ano=list(range(2004,2019))
	number=2003
	lst2=list()
	for i in list_ano:
		number=number+1
		link=link_skeleton + str(number)
		html = urllib.request.urlopen(link, context=ctx).read()
		soup = BeautifulSoup(html, 'html.parser')
		table=soup.find_all('table',class_='mobile-table')

		table=table[0]
        #print(type(table))

		lst=list()
		for row in table.find_all('tr',class_='nivel-3 mobile-tr aberto'):
			for cell in row.find_all('td',class_='valor s'):
				lin=cell.text
				lin=lin.rstrip()
				if len(lst)==0:
					lst.append(lin)
		for i in lst:
			bois=i
		lst2.append(bois)
	lst3.append(lst2)
print(lst3)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	HTTP 404 error with Session Pool	Clives	0	1,832	Jun-17-2021, 06:45 PM Last Post: Clives
	error HTTP Error 403: Forbidden	local_bit	1	4,015	Nov-14-2020, 11:34 AM Last Post: ndc85430
	How to send unicode string encoded in utf-8 in http request in Python	MaverinCode	1	36,042	Nov-08-2020, 06:45 AM Last Post: JaiM
	Beginner: urllib error	tomfry	7	9,070	May-03-2020, 04:35 AM Last Post: Larz60+
	urllib.error.HTTPError: HTTP Error 404: Not Found	ckkkkk	4	10,628	Mar-03-2020, 11:30 AM Last Post: snippsat
	HTTP error 404	Karin	4	5,842	May-31-2019, 02:23 PM Last Post: snippsat
	How to check HTTP error 500 and bypass	SriMekala	3	14,178	May-04-2019, 02:07 PM Last Post: snippsat
	Syntax error for HTTP request GET	THX1138	1	7,077	May-12-2018, 12:02 PM Last Post: snippsat

python beginner HTTP Error 500

User Panel Messages

Announcements