(Jun-07-2017, 09:11 PM)slinkplink Wrote: returns mostly text but with a lot of newline ("\n") tags and without formatting. Basically, my question now is this: How do I extract this data nicely into an excel table or .csv format?Thats because its getting all the text from the table....and whitespace is considered text. You can use str.strip() to remove whitespace for each row and get each row.
Im sure there is already pre-made html to cvs scripts if you search for them...this is the first result from google
https://gist.github.com/n8henrie/08a31f02fd1282d12b75
test.py
#!/usr/bin/env python3 """html_to_csv.py Prompts for a URL, displays HTML tables from that page, then converts the selected table to a csv file. """ import sys import pandas if sys.version[0] == '2': input = raw_input url = input("Enter the URL: ") tables = pandas.io.html.read_html(url) ''' for index, table in enumerate(tables): print("Table {}:".format(index + 1)) print(table.head() + '\n') print('-' * 60) print('\n') ''' choice = int(input("Enter the number of the table you want: ")) - 1 filename = input("Enter a filename (.csv extension assumed): ") + '.csv' with open(filename, 'w') as outfile: tables[choice].to_csv(outfile, index=False, header=False)
Output:metulburr@ubuntu:~$ python test.py
Enter the URL: https://public.hcad.org/records/Real/AdvancedResults.asp?name=&desc=&stname=westheimer&bstnum=&estnum=&zip=&kmap=&facet=&isd=&StateCategory=F1&BSC=&LUC=&nbhd=&val=&valrange=.10&sqft=&sqftrange=.10&Sort=Account&bstep=0&taxyear=2017&Search=Search
Enter the number of the table you want: 3
Enter a filename (.csv extension assumed): stuff
Output:
Recommended Tutorials: