Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
python beginner HTTP Error 500
#1
Dear all, this is my first time sharing doubts in a forum. I would be very grateful if you could help me.

I have succeeded in scrapping the data of a website using a certain code. Then I wrote some code to reproduce the process multiple times. The error message error 500 makes me believe that the code could be right and that the server could simply not allow me to scrap as much I want to. I do not know much about programming. Thus, I wonder if there is a simple solution or if I should simply learn way more about coding before being able to solve such an issue. The code is the following:

# Import for the first part of the code
import codecs
import re
import unidecode
# Import for the second part of the code
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
import re
import pandas as pd

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

# Part 1: Import a file with the names of all cities in goiás and make changes according to the url of the IBGE website
lst_cidades=list()
handle=codecs.open('cidades_goias_pequeno.txt','r','utf8')
for line in handle:
	line=line.rstrip()
	line=line.lower()
	line= re.sub(" ", "-", line)
	cidade = unidecode.unidecode(line)
	lst_cidades.append(cidade)
#print(lst_cidades)
# Part 2: I will now try to create all the possible url that display the pages where I can find the number of cattle in each city from 2004 to 2008
lst3=list()
for item in lst_cidades:
	link_skeleton='https://cidades.ibge.gov.br/brasil/go/'+ str(item) + '/pesquisa/18/16459?ano='
	#print('Links para a cidade de', item,':')
	#print(link_skeleton)
	list_ano=list(range(2004,2019))
	number=2003
	lst2=list()
	for i in list_ano:
		number=number+1
		link=link_skeleton + str(number)
		html = urllib.request.urlopen(link, context=ctx).read()
		soup = BeautifulSoup(html, 'html.parser')
		table=soup.find_all('table',class_='mobile-table')

		table=table[0]
        #print(type(table))

		lst=list()
		for row in table.find_all('tr',class_='nivel-3 mobile-tr aberto'):
			for cell in row.find_all('td',class_='valor s'):
				lin=cell.text
				lin=lin.rstrip()
				if len(lst)==0:
					lst.append(lin)
		for i in lst:
			bois=i
		lst2.append(bois)
	lst3.append(lst2)
print(lst3)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  HTTP 404 error with Session Pool Clives 0 1,285 Jun-17-2021, 06:45 PM
Last Post: Clives
  error HTTP Error 403: Forbidden local_bit 1 2,781 Nov-14-2020, 11:34 AM
Last Post: ndc85430
  How to send unicode string encoded in utf-8 in http request in Python MaverinCode 1 32,356 Nov-08-2020, 06:45 AM
Last Post: JaiM
  Beginner: urllib error tomfry 7 6,470 May-03-2020, 04:35 AM
Last Post: Larz60+
  urllib.error.HTTPError: HTTP Error 404: Not Found ckkkkk 4 8,635 Mar-03-2020, 11:30 AM
Last Post: snippsat
  HTTP error 404 Karin 4 4,657 May-31-2019, 02:23 PM
Last Post: snippsat
  How to check HTTP error 500 and bypass SriMekala 3 10,376 May-04-2019, 02:07 PM
Last Post: snippsat
  Syntax error for HTTP request GET THX1138 1 6,345 May-12-2018, 12:02 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020