Python Forum
python beginner HTTP Error 500 - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: python beginner HTTP Error 500 (/thread-23957.html)



python beginner HTTP Error 500 - leofcastro - Jan-24-2020

Dear all, this is my first time sharing doubts in a forum. I would be very grateful if you could help me.

I have succeeded in scrapping the data of a website using a certain code. Then I wrote some code to reproduce the process multiple times. The error message error 500 makes me believe that the code could be right and that the server could simply not allow me to scrap as much I want to. I do not know much about programming. Thus, I wonder if there is a simple solution or if I should simply learn way more about coding before being able to solve such an issue. The code is the following:

# Import for the first part of the code
import codecs
import re
import unidecode
# Import for the second part of the code
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
import re
import pandas as pd

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

# Part 1: Import a file with the names of all cities in goiás and make changes according to the url of the IBGE website
lst_cidades=list()
handle=codecs.open('cidades_goias_pequeno.txt','r','utf8')
for line in handle:
	line=line.rstrip()
	line=line.lower()
	line= re.sub(" ", "-", line)
	cidade = unidecode.unidecode(line)
	lst_cidades.append(cidade)
#print(lst_cidades)
# Part 2: I will now try to create all the possible url that display the pages where I can find the number of cattle in each city from 2004 to 2008
lst3=list()
for item in lst_cidades:
	link_skeleton='https://cidades.ibge.gov.br/brasil/go/'+ str(item) + '/pesquisa/18/16459?ano='
	#print('Links para a cidade de', item,':')
	#print(link_skeleton)
	list_ano=list(range(2004,2019))
	number=2003
	lst2=list()
	for i in list_ano:
		number=number+1
		link=link_skeleton + str(number)
		html = urllib.request.urlopen(link, context=ctx).read()
		soup = BeautifulSoup(html, 'html.parser')
		table=soup.find_all('table',class_='mobile-table')

		table=table[0]
        #print(type(table))

		lst=list()
		for row in table.find_all('tr',class_='nivel-3 mobile-tr aberto'):
			for cell in row.find_all('td',class_='valor s'):
				lin=cell.text
				lin=lin.rstrip()
				if len(lst)==0:
					lst.append(lin)
		for i in lst:
			bois=i
		lst2.append(bois)
	lst3.append(lst2)
print(lst3)