Jan-24-2020, 04:37 PM
Dear all, this is my first time sharing doubts in a forum. I would be very grateful if you could help me.
I have succeeded in scrapping the data of a website using a certain code. Then I wrote some code to reproduce the process multiple times. The error message error 500 makes me believe that the code could be right and that the server could simply not allow me to scrap as much I want to. I do not know much about programming. Thus, I wonder if there is a simple solution or if I should simply learn way more about coding before being able to solve such an issue. The code is the following:
I have succeeded in scrapping the data of a website using a certain code. Then I wrote some code to reproduce the process multiple times. The error message error 500 makes me believe that the code could be right and that the server could simply not allow me to scrap as much I want to. I do not know much about programming. Thus, I wonder if there is a simple solution or if I should simply learn way more about coding before being able to solve such an issue. The code is the following:
# Import for the first part of the code import codecs import re import unidecode # Import for the second part of the code import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl import re import pandas as pd # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE # Part 1: Import a file with the names of all cities in goiás and make changes according to the url of the IBGE website lst_cidades=list() handle=codecs.open('cidades_goias_pequeno.txt','r','utf8') for line in handle: line=line.rstrip() line=line.lower() line= re.sub(" ", "-", line) cidade = unidecode.unidecode(line) lst_cidades.append(cidade) #print(lst_cidades) # Part 2: I will now try to create all the possible url that display the pages where I can find the number of cattle in each city from 2004 to 2008 lst3=list() for item in lst_cidades: link_skeleton='https://cidades.ibge.gov.br/brasil/go/'+ str(item) + '/pesquisa/18/16459?ano=' #print('Links para a cidade de', item,':') #print(link_skeleton) list_ano=list(range(2004,2019)) number=2003 lst2=list() for i in list_ano: number=number+1 link=link_skeleton + str(number) html = urllib.request.urlopen(link, context=ctx).read() soup = BeautifulSoup(html, 'html.parser') table=soup.find_all('table',class_='mobile-table') table=table[0] #print(type(table)) lst=list() for row in table.find_all('tr',class_='nivel-3 mobile-tr aberto'): for cell in row.find_all('td',class_='valor s'): lin=cell.text lin=lin.rstrip() if len(lst)==0: lst.append(lin) for i in lst: bois=i lst2.append(bois) lst3.append(lst2) print(lst3)