Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
BeautifulSoup Parsing Error
#4
(Jun-07-2017, 09:11 PM)slinkplink Wrote: returns mostly text but with a lot of newline ("\n") tags and without formatting. Basically, my question now is this: How do I extract this data nicely into an excel table or .csv format?
Thats because its getting all the text from the table....and whitespace is considered text. You can use str.strip() to remove whitespace for each row and get each row.

Im sure there is already pre-made html to cvs scripts if you search for them...this is the first result from google
https://gist.github.com/n8henrie/08a31f02fd1282d12b75

test.py
#!/usr/bin/env python3
"""html_to_csv.py
Prompts for a URL, displays HTML tables from that page, then converts
the selected table to a csv file.
"""

import sys
import pandas

if sys.version[0] == '2':
    input = raw_input
    
url = input("Enter the URL: ")
tables = pandas.io.html.read_html(url)
'''
for index, table in enumerate(tables):
    print("Table {}:".format(index + 1))
    print(table.head() + '\n')
    print('-' * 60)
    print('\n')
'''
choice = int(input("Enter the number of the table you want: ")) - 1
filename = input("Enter a filename (.csv extension assumed): ") + '.csv'

with open(filename, 'w') as outfile:
    tables[choice].to_csv(outfile, index=False, header=False)
Output:
metulburr@ubuntu:~$ python test.py Enter the URL: https://public.hcad.org/records/Real/AdvancedResults.asp?name=&desc=&stname=westheimer&bstnum=&estnum=&zip=&kmap=&facet=&isd=&StateCategory=F1&BSC=&LUC=&nbhd=&val=&valrange=.10&sqft=&sqftrange=.10&Sort=Account&bstep=0&taxyear=2017&Search=Search Enter the number of the table you want: 3 Enter a filename (.csv extension assumed): stuff
Output:
   
Recommended Tutorials:
Reply


Messages In This Thread
BeautifulSoup Parsing Error - by slinkplink - Jun-07-2017, 05:12 PM
RE: BeautifulSoup Parsing Error - by nilamo - Jun-07-2017, 05:58 PM
RE: BeautifulSoup Parsing Error - by slinkplink - Jun-07-2017, 09:11 PM
RE: BeautifulSoup Parsing Error - by metulburr - Jun-07-2017, 09:57 PM
RE: BeautifulSoup Parsing Error - by slinkplink - Jun-08-2017, 03:06 PM
RE: BeautifulSoup Parsing Error - by Larz60+ - Jun-08-2017, 08:49 PM
RE: BeautifulSoup Parsing Error - by seco - Feb-12-2018, 02:55 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Strange ModuleNotFound Error on BeautifulSoup for Python 3.11 Gaberson19 1 1,017 Jul-13-2023, 10:38 AM
Last Post: Gaurav_Kumar
  [Solved]Help with BeautifulSoup.getText() Error Extra 5 3,803 Jan-19-2023, 02:03 PM
Last Post: prvncpa
  BeautifulSoup not parsing other URLs giddyhead 0 1,204 Feb-23-2022, 05:35 PM
Last Post: giddyhead
  BeautifulSoup: 6k records - but stops after parsing 20 lines apollo 0 1,823 May-10-2021, 05:08 PM
Last Post: apollo
  Logic behind BeautifulSoup data-parsing jimsxxl 7 4,339 Apr-13-2021, 09:06 AM
Last Post: jimsxxl
  Error with NumPy, BeautifulSoup when using pip tsurubaso 7 5,323 Oct-20-2020, 04:34 PM
Last Post: tsurubaso
  Python beautifulsoup pagination error The61 5 3,492 Apr-09-2020, 09:17 PM
Last Post: Larz60+
  BeautifulSoup: Error while extracting a value from an HTML table kawasso 3 3,248 Aug-25-2019, 01:13 AM
Last Post: kawasso
  beautifulsoup error rudolphyaber 7 5,555 May-26-2019, 02:12 PM
Last Post: heiner55
  Beautifulsoup parsing Larz60+ 7 6,093 Apr-05-2017, 03:07 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020