Python Forum
web scraping to csv formatting problems
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
web scraping to csv formatting problems
#5
It's mighty difficult to give advise without looking at the page.
usual layout for a table is to have multiple tr's and multiple td's within each tr.
Here's an example of this on a simple page with only one table:
        table = soup.find('table', {'summary': 'This table displays Connecticut towns and the year of their establishment.'})
        trs = table.tbody.find_all('tr')

        for n, tr in enumerate(trs):
            for n1, td in enumerate(self.get_td(tr)):
                print(f'==================================== tr {n}, td: {n1} ====================================')
                print(f'{self.pp.prettify(td, 2)}')
This will give you a layout of the page and make it easier to determine how to proceed.
the prettify method is in module PrettifyPage.py which is a modified version of BeautfulSoup's prettify which allows changing indent size

from bs4 import BeautifulSoup
import requests
import pathlib


class PrettifyPage:
    def __init__(self):
        pass
        
    def prettify(self, soup, indent):
        pretty_soup = str()
        previous_indent = 0
        for line in soup.prettify().split("\n"):
            current_indent = str(line).find("<")
            if current_indent == -1 or current_indent > previous_indent + 2:
                current_indent = previous_indent + 1
            previous_indent = current_indent
            pretty_soup += self.write_new_line(line, current_indent, indent)
        return pretty_soup

    def write_new_line(self, line, current_indent, desired_indent):
        new_line = ""
        spaces_to_add = (current_indent * desired_indent) - current_indent
        if spaces_to_add > 0:
            for i in range(spaces_to_add):
                new_line += " "		
        new_line += str(line) + "\n"
        return new_line
Reply


Messages In This Thread
RE: web scraping to csv formatting problems - by Larz60+ - Jul-04-2019, 02:00 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Scraping problems with Python requests. gtlhbkkj 1 1,886 Jan-22-2020, 11:00 AM
Last Post: gtlhbkkj
  Scraping problems. Pls help with a correct request query. gtlhbkkj 0 1,518 Oct-09-2019, 12:00 PM
Last Post: gtlhbkkj
  Scraping problems. Pls help with a correct request query. gtlhbkkj 6 3,102 Oct-01-2019, 09:22 PM
Last Post: gtlhbkkj
  Formatting Output After Web Scraping yoitspython 3 2,925 Aug-01-2019, 01:22 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020