Jul-02-2019, 08:38 PM
(This post was last modified: Jul-02-2019, 08:38 PM by bluethundr.)
Hello,
I am trying to scrape a web page and send the result to CSV. I am able to get the content I want in the CSV. However, the content is being repeated down the page and unique info is sent across the page, instead of down the page under the headers.
This is the result I'm getting: CSV Output
The CSV should list the accounts one per line, going down and not across as in this example. This is the original wiki page that I'm scraping (had to block out company info): Original Wiki Page
This is the code I am using:
I am trying to scrape a web page and send the result to CSV. I am able to get the content I want in the CSV. However, the content is being repeated down the page and unique info is sent across the page, instead of down the page under the headers.
This is the result I'm getting: CSV Output
The CSV should list the accounts one per line, going down and not across as in this example. This is the original wiki page that I'm scraping (had to block out company info): Original Wiki Page
This is the code I am using:
import csv import os import requests from requests import get from requests.exceptions import RequestException from contextlib import closing from bs4 import BeautifulSoup output_dir = os.path.join( '..', 'output_files', 'aws_accounts_list') source = 'aws_wiki_page' destination = os.path.join(output_dir, source + '.csv' ) url = 'https://wiki.us.cworld.company.com/display/6TO/AWS+Accounts' page = requests.get(url, auth=('me', 'secret')) headers = ['Company Account Name', 'AWS Account Name', 'Description', 'LOB', 'AWS Account Number', 'Connected to Homebase', 'Peninsula or Island', 'URL', 'Owner', 'Engagement Code', 'CloudOps Access Type'] soup = BeautifulSoup(page.text, 'lxml') rows = [] for tr in soup.select('tr'): rows.append([td.text for td in soup.select('td')]) with open(destination, 'w+', newline='') as csvfile: writer = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL) writer.writerow(headers) for row in rows: writer.writerow(row) print(row)What am I doing wrong?