Axel - Very interesting. Thank you!
Do you mind stepping through some questions/assumptions?
This creates a dataset from a table that takes all rows in the table, splits the string after a space and creates a new line. The rows are then appended.
File "cfbstats_larz.py", line 9, in <module>
soup = BeautifulSoup(page, 'html.parser')
NameError: name 'page' is not defined
Is is something I did?
My apologies for combining the replies, I don't know what happened.
Do you mind stepping through some questions/assumptions?
This creates a dataset from a table that takes all rows in the table, splits the string after a space and creates a new line. The rows are then appended.
datasets = [] mytable = table.find_all("tr")#[1:] for row in mytable: text = str(row.get_text()).split('\n') datasets.append(text)I'm having a real hard time following this one - and how did the headers get there?
_len = len(datasets) for x in range(_len -1): t = datasets[x] print((t[1] + '\t' + t[2] + '\t' + t[5]).expandtabs(30))I have learned some code for csv writer. Below is a sample.
with open('test_cfbstats.csv', 'w', newline='') as f: writer = csv.writer(f) writer.writerow(['Date', 'Opponent']) writer.writerows(data)How would you suggest modifying for use in your code? I'm not sure if the writerow would be necessary, and the writerows would change to datasets?
File "cfbstats_larz.py", line 9, in <module>
soup = BeautifulSoup(page, 'html.parser')
NameError: name 'page' is not defined
Is is something I did?
(Dec-15-2018, 11:10 PM)Larz60+ Wrote: I did it a bit differently, same results:
import requests from bs4 import BeautifulSoup import csv import os url = 'http://www.cfbstats.com/2018/team/234/index.html' r = requests.get(url) soup = BeautifulSoup(page, 'html.parser') table = soup.findAll("table",{"class": "team-schedule"})[0] trs = table.find_all('tr') header = [] for n, tr in enumerate(trs): if n == 0: # Get Header ths = tr.find_all('th') for th in ths: header.append(th.text.strip()) for item in header: print('{:22}'.format(item), end='') print() continue else: game_item = [] tds = tr.find_all('td') for td in tds: game_item.append(td.text.strip()) for item in game_item: print('{:22}'.format(item), end='') print()
Output:Date Opponent Result Game Time Attendance 09/03/18 Virginia Tech L 3-24 3:12 75,237 09/08/18 Samford W 36-26 3:51 72,239 09/15/18 @ 17 Syracuse L 7-30 3:37 37,457 09/22/18 Northern Ill. W 37-19 3:34 65,633 09/29/18 @ Louisville W 28-24 3:27 52,798 10/06/18 @ Miami (Fla.) L 27-28 4:01 65,490 10/20/18 Wake Forest W 38-17 3:34 67,274 10/27/18 2 Clemson L 10-59 3:47 68,403 11/03/18 @ North Carolina St. L 28-47 3:33 57,600 11/10/18 @ 3 Notre Dame L 13-42 3:22 77,622 11/17/18 Boston College W 22-21 3:31 57,274 11/24/18 10 Florida L 14-41 3:27 71,953 @ : Away, + : Neutral Site
My apologies for combining the replies, I don't know what happened.