Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Formatting Output after Web Scrape
#1
Hi,

I have web scraped an HTML table using BeautifulSoup and requests, and now trying to create a DataFrame from the results. My current output is:

['Adams, Chris ']
['ALBANESE, MARC ']
['BAILEY, MARIO ']
['DIAZ DE LA PORTILLA, ESQ., MIGUEL ']
['GONZALEZ, JOSE M']
['JOHNSON, ERIN E']
['KESTI, MICHAEL ']
['KESTI, MICHAEL ']
['KRISCHER, ALAN ']

But I want to format it so it looks like this:

[['Adams, Chris '], ['ALBANESE, MARC '], ['BAILEY, MARIO '], ['DIAZ DE LA PORTILLA, ESQ., MIGUEL '], ['GONZALEZ, JOSE M'], ['JOHNSON, ERIN E'], ['KESTI, MICHAEL '], ['KESTI, MICHAEL '], ['KRISCHER, ALAN ']]

So that it can be inserted to a dictionary such as this:

dictionary={'Name':[['Adams, Chris '], ['ALBANESE, MARC '], ['BAILEY, MARIO '], ['DIAZ DE LA PORTILLA, ESQ., MIGUEL '], ['GONZALEZ, JOSE M'], ['JOHNSON, ERIN E'], ['KESTI, MICHAEL '], ['KESTI, MICHAEL '], ['KRISCHER, ALAN ']]}

So far, this is part my code

new_list = []
    names_list = cell[0].string
    li = list(names_list.split("-"))
    #dic={'Name':names_list}
    #print(dic)
    print(li)
    #lit = li.append(cell[0].string)
    #print(names_list)
Thanks for any help. Let me know if my head is in the right place. Let me know the general scheme of how I should go about doing this and if there is a related problem lmk. thnxx so much !
Quote
#2
Hello can you post the code wiuch you use for scraped ?
Quote
#3
import PipPackages
import requests
from bs4 import BeautifulSoup, SoupStrainer
import pandas as pd

website_url = requests.get('https://www8.miamidade.gov/Apps/COB/LobbyistOnline/Views/Queries/Registration_ByPeriod_List.aspx?startdate=07%2f01%2f2019&enddate=07%2f05%2f2019')
content = website_url.text
#print(content)

soup = BeautifulSoup(content,'lxml')
table = soup.find('table', attrs={'id': 'ctl00_mainContentPlaceHolder_gvLobbyistRegList'})
#print(table.prettify())

#print(table.get_text())
#remove = table.decompose("//*[@id='ctl00_mainContentPlaceHolder_gvLobbyistRegList']/tbody/tr[25]")



rows_in_table = []
#columnNumber=0
# create loop on top to remove row 1 consideration 
for row in table.findAll('tr')[1:]:
    #print(row.prettify())
    #print('\n')
    #testing
    cell = row.findAll('td')
    #print(cell)
    #print(type(cell))
    if len(rows_in_table) == 0: # 0 == 0
        rows_in_table = [None for _ in cell] #loops find all td elements
        #print(rows_in_table)
    elif len(cell) != len(rows_in_table): # 4 != 5
        for index, rowspan in enumerate(rows_in_table):
            if rowspan is not None:
                value = rowspan["value"]
                cell.insert(index, value)
                #print(index) # 0
                #print(value) # kesti michael 

                #decreases rows by 1
                if rows_in_table[index]["rows_left"] == 1:
                    rows_in_table[index] = None
                else:
                    rows_in_table[index]["rows_left"] -= 3 #decrease
    #print(cell[0].string)
    new_list = []
    #print(type(new_list))
    names_list = cell[0].string
    li = list(names_list.split("-"))
    #dic={'Name':cell[0].string}
    #print(dic)
    #print(li)
    #lit = li.append(cell[0].string)
    #print(names_list)
    

    #print(rows_in_table)
    for index, x in enumerate(cell):
        #print(index)
        #print(x)
        
        #print(x.content)
        #text = x.text.replace(' ','')
        if x.has_attr("rowspan"):
            rowspan = {"rows_left": int(x["rowspan"]), "value": x}
            rows_in_table[index] = rowspan
        #contentt = x.text
        #y = pd.DataFrame(contentt)
        #columnNumber+=1
            #df = pd.DataFrame(rowspan)
            #print(df)
        
                #read title (content) of column
##        columnName = x.text
##        print('%d: %s' % (columnNumber,columnName))
##        rows_in_table.append((columnName,[]))    
    #print(rows_in_table)
    

'''
    list_of_cells = []
    #print('row ' + str(len(table.findAll('tr'))))
    for cell in row.findAll('td'):
        text = cell.text
        #width of table == 5
        if len(row.findAll('td')) != 5:
            for index, rowspan in enumerate(rows_in_table):
                if rowspan is not None:
                    combine_rows = rowspan["value"]
                    #print(combine_rows)

                    test = cell.insert(index, combine_rows)
                    print(test)
                    
    
##        Create a conditional here to indicate that if text houses 4 entries,
##        then append to previous 5 entry cell.
        
##        if row.find('td', attrs={'rowspan':'2'}):
##            list_of_cells.append.previous_node?
##        else:
##            list_of_cells.append(text)
                
            
        list_of_cells.append(text)
    print(list_of_cells)
'''
metulburr wrote Jul-30-2019, 09:02 PM:
Please post all code, output and errors (in it's entirety) between their respective tags. I did it for you this time, Here are instructions on how to do it yourself next time.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Formatting Output After Web Scraping yoitspython 3 187 Aug-01-2019, 01:22 PM
Last Post: snippsat
  web scraping to csv formatting problems bluethundr 4 331 Jul-04-2019, 02:00 AM
Last Post: Larz60+
  webscrapping links and then enter those links to scrape data kirito85 2 231 Jun-13-2019, 02:23 AM
Last Post: kirito85
  Scrape ASPX data with python... hoff1022 0 651 Feb-26-2019, 06:16 PM
Last Post: hoff1022
  Scrape java script web site PythonHunger 6 665 Oct-25-2018, 05:59 AM
Last Post: PythonHunger
  Need To Scrape Some Links digitalmatic7 2 466 Oct-09-2018, 02:33 AM
Last Post: digitalmatic7
  Basic Syntax/HTML Scrape Questions sungar78 5 735 Sep-06-2018, 09:32 PM
Last Post: sungar78
  How do i scrape website whose page changes using javsacript _dopostback function and Prince_Bhatia 1 1,134 Aug-06-2018, 09:45 AM
Last Post: wavic
  Scrape multiple lines with regex greetings 2 715 Jul-04-2018, 09:09 PM
Last Post: snippsat
  How to scrape only unique values and save it into database Prince_Bhatia 0 584 Jun-08-2018, 06:59 AM
Last Post: Prince_Bhatia

Forum Jump:


Users browsing this thread: 1 Guest(s)