Jul-31-2019, 08:53 PM
Hi (again, I'm redoing my previous post with more detail),
I am currently web scraping a site using START DATE 07/01/2019 and END DATE 07/05/2019. The URL is Lobbyist Registrations
I am web scraping the table on that site and trying to place that data in a newly created table using Python (of course). I have gotten the data but I'm faced with having to move that data into a table. I have tried to create a list or DataFrame for the data but have not gotten any success. Any help is appreciated!
So far, I have successfully web scraped the table using the following code:
I am currently web scraping a site using START DATE 07/01/2019 and END DATE 07/05/2019. The URL is Lobbyist Registrations
I am web scraping the table on that site and trying to place that data in a newly created table using Python (of course). I have gotten the data but I'm faced with having to move that data into a table. I have tried to create a list or DataFrame for the data but have not gotten any success. Any help is appreciated!
So far, I have successfully web scraped the table using the following code:
import PipPackages import requests from bs4 import BeautifulSoup, SoupStrainer import pandas as pd website_url = requests.get('https://www8.miamidade.gov/Apps/COB/LobbyistOnline/Views/Queries/Registration_ByPeriod_List.aspx?startdate=07%2f01%2f2019&enddate=07%2f05%2f2019') content = website_url.text soup = BeautifulSoup(content,'lxml') table = soup.find('table', attrs={'id': 'ctl00_mainContentPlaceHolder_gvLobbyistRegList'}) rows_in_table = [] for row in table.findAll('tr')[1:]: cell = row.findAll('td') if len(rows_in_table) == 0: # 0 == 0 rows_in_table = [None for _ in cell] elif len(cell) != len(rows_in_table): # 4 != 5 for index, rowspan in enumerate(rows_in_table): if rowspan is not None: value = rowspan["value"] cell.insert(index, value) # when, index = 0 & value = kesti michael if rows_in_table[index]["rows_left"] == 1: rows_in_table[index] = None else: rows_in_table[index]["rows_left"] -= 1 print(cell[0].string) #new_list = [] #names_list = cell[0].string #li = list(names_list.split("-")) for index, x in enumerate(cell): if x.has_attr("rowspan"): rowspan = {"rows_left": int(x["rowspan"]), "value": x} rows_in_table[index] = rowspanOutput that results from the code above is the following: (of course, because the print statement is using cell[0].string as the argument)
Output:Adams, Chris
ALBANESE, MARC
BAILEY, MARIO
DIAZ DE LA PORTILLA, ESQ., MIGUEL
GONZALEZ, JOSE M
JOHNSON, ERIN E
KESTI, MICHAEL
KESTI, MICHAEL
KRISCHER, ALAN
KUIPER, KENNETH A
LASARTE, FELIX M
LERMA, LISA I
MANZANO, JESSE
MARTINEZ DE CASTRO, ORLANDO
OTAZO, JULIO O
PRAGER, RICHARD S
RIVET, JEFFREY J
RUIZ-DIAZ DE LA PORTILLA, ELINETTE
TAYLOR, MICHAEL
THOMSON, CHRISTIAN
WELLS, GARY T
WOLF, JAMES M
ZELEDON, CLARIMAR
Believe me, I have attempted to place the current output ^ into a list because I wanted to make a list to put in a dictionary like dictionary = {'Name':list, 'Date':date} but have failed as shown below: (Python code then output)#print(cell[0].string) new_list = [] names_list = cell[0].string li = list(names_list.split("-")) print(li + new_list) # attempt #1 #print(li.append(new_list)) #attempt #2 # I TRIED TO WORK WITH A DICTIONARY IN ONE ATTEMPT #dictionary={'Names':names_list} #df = pd.DataFrame(dictionary) #attempt #3 #print(df) # this prints 'Names' : ...names... in a long listBased on my first attempt, I get this:
Output:['Adams, Chris ']
['ALBANESE, MARC ']
['BAILEY, MARIO ']
['DIAZ DE LA PORTILLA, ESQ., MIGUEL ']
['GONZALEZ, JOSE M']
['JOHNSON, ERIN E']
['KESTI, MICHAEL ']
['KESTI, MICHAEL ']
['KRISCHER, ALAN ']
['KUIPER, KENNETH A']
['LASARTE, FELIX M']
['LERMA, LISA I']
['MANZANO, JESSE ']
['MARTINEZ DE CASTRO, ORLANDO ']
['OTAZO, JULIO O']
['PRAGER, RICHARD S']
['RIVET, JEFFREY J']
['RUIZ', 'DIAZ DE LA PORTILLA, ELINETTE ']
['TAYLOR, MICHAEL ']
['THOMSON, CHRISTIAN ']
['WELLS, GARY T']
['WOLF, JAMES M']
['ZELEDON, CLARIMAR ']
['\xa0']
Can someone help me out with how I should implement the code to get the output properly formatted to a table, such as DataFrame? Any sort or method to achieve would be great ! Thank you so much for the help ! thnxx