Python Forum

Full Version: Python/BeautiifulSoup. list of urls ->parse->extract data to csv. getting ERROR
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have a list of urls in a csv file (I can either host said file on my local machine or online). I need to pull biz name, address, and phone # from the web pages in the list. I have all of the correct class names. I want to extract this data to a csv with the aforementioned columns.

From the csv:

https://slicelife.com/restaurants/wi/mil...aukee/menu
https://slicelife.com/restaurants/nj/nor...hvale/menu
https://slicelife.com/restaurants/mn/man...pizza/menu
https://slicelife.com/restaurants/pa/new...k-hut/menu


When I run the code, it will create a csv with the desired column headers, but no data due to errors. I CAN pull data from the scraped urls one at a time like this:


# locationRawData = soup.find('div', attrs={"class": "f19xeu2d"}).text.encode('utf-8'), 
# pizzeriaName = soup.find('h1', attrs={"class": "f13p7rsj"}).text.encode('utf-8'),
# address = soup.find('address', attrs={"class": "f1lfckhr"}).text.encode('utf-8'),
# phoneNumber = soup.find('button', attrs={"class": "f12gt8lx"}).text.encode('utf-8'),
I have tried:

from bs4 import BeautifulSoup
import requests
import json
import csv
from urllib.request import urlopen


TrattoriArray = []
with open('aliveSlice.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        TrattoriArray.append(url) # Add each url to list contents

for url in TrattoriArray:  # Parse through each url in the list.
    page = urlopen(url[0]).read()
    content = BeautifulSoup(page.content, "html.parser")

pizzaArray = []
for pizzeria in content.findAll('div', attrs={"class": "f19xeu2d"}):
    pizzeriaObject = {
        "pizzeriaName": pizzeria.find('h1', attrs={"class": "f13p7rsj"}).text.encode('utf-8'),
        "address": pizzeria.find('address', attrs={"class": "f1lfckhr"}).text.encode('utf-8'),
        "phoneNumber": pizzeria.find('rc-c2d-number', attrs={"span": "rc-c2d-number"}).text.encode('utf-8'),

    }
    pizzaArray.append(pizzeriaObject)
with open('pizzeriaData.json', 'w') as outfile:
    json.dump(pizzaArray, outfile)
and

import requests
from bs4 import BeautifulSoup
import csv

with open('aliveSCRAPE.csv', newline='') as f_urls, open('output.csv', 'w', newline='') as f_output:
    csv_urls = csv.reader(f_urls)
    csv_output = csv.writer(f_output)
    csv_output.writerow(['locationRawData' , 'pizzeriaName' , 'address', 'Phone'])

    for line in csv_urls:
        r = requests.get(line[0]).text
        soup = BeautifulSoup(r.content, 'lxml')

        locationRawData = soup.find('h1')
        print('RAW :', locationRawData.text)

        pizzeriaName = soup.find('h1', class_='f13p7rsj').text
        pizzeria_name = pizzeria.split(':')
        print('pizzeriaName:', pizzeria_name[1])

        address = soup.find_all('address', class_='f1lfckhr'})
        print('Address :', address[2].text)

        phoneNumber = soup.find_all('button', class_='f12gt8lx')
        print('Phone :', phoneNumber[3].text)

        locationRawData = soup.find_all('div', class_='f19xeu2d'})
        print('RAW :', locationRawData[4].text)

        csv_output.writerow([locationRawData.text, pizzeria_name[1], address[2].text, phoneNumber[3].text])
And...a few other methods, which is the easiest? This is literally the first thing I have ever programmed in Python.
...\Desktop\scrapeYourPlate\test\Code>Python scrape.py
RAW : Bakers Buck Hut
Traceback (most recent call last):
  File "scrape.py", line 98, in <module>
    print('pizzeriaName:', pizzeriaName[1].text)
  File ...AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\bs4\element.py", line 1016, in __getitem__
    return self.attrs[key]
KeyError: 1
python python-3.x beautifulsoup



ERROR:


CodeNinjaGrasshopper
255 bronze badges
File "scrape.py", line 98, in <module> print('pizzeriaName:', pizzeriaName[1].text) File "C:\...\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\bs4\element.py", line 1016, in getitem return self.attrs[key] KeyError: 1 
This page has enough java script in it that I would get the first page using selenium, thes use beautiful soup to get the details
There are examples of this on this forum under tutorials/web scraping (by snippsat)
(Jul-04-2019, 01:44 AM)Larz60+ Wrote: [ -> ]This page has enough java script in it that I would get the first page using selenium, thes use beautiful soup to get the details
There are examples of this on this forum under tutorials/web scraping (by snippsat)

Okay I will research that! Thank you for your reply! Right now I feel like I dont know what I dont know and I dont even know what to search for unles I am pointed in the right direction as you took the time to do! Thanks!