Python/BeautiifulSoup. list of urls ->parse->extract data to csv. getting ERROR

IanTheLMT · (This post was last modified: Jul-03-2019, 01:52 PM by IanTheLMT.)

I have a list of urls in a csv file (I can either host said file on my local machine or online). I need to pull biz name, address, and phone # from the web pages in the list. I have all of the correct class names. I want to extract this data to a csv with the aforementioned columns.

From the csv:

https://slicelife.com/restaurants/wi/mil...aukee/menu
https://slicelife.com/restaurants/nj/nor...hvale/menu
https://slicelife.com/restaurants/mn/man...pizza/menu
https://slicelife.com/restaurants/pa/new...k-hut/menu

When I run the code, it will create a csv with the desired column headers, but no data due to errors. I CAN pull data from the scraped urls one at a time like this:

# locationRawData = soup.find('div', attrs={"class": "f19xeu2d"}).text.encode('utf-8'), 
# pizzeriaName = soup.find('h1', attrs={"class": "f13p7rsj"}).text.encode('utf-8'),
# address = soup.find('address', attrs={"class": "f1lfckhr"}).text.encode('utf-8'),
# phoneNumber = soup.find('button', attrs={"class": "f12gt8lx"}).text.encode('utf-8'),

I have tried:

from bs4 import BeautifulSoup
import requests
import json
import csv
from urllib.request import urlopen


TrattoriArray = []
with open('aliveSlice.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        TrattoriArray.append(url) # Add each url to list contents

for url in TrattoriArray:  # Parse through each url in the list.
    page = urlopen(url[0]).read()
    content = BeautifulSoup(page.content, "html.parser")

pizzaArray = []
for pizzeria in content.findAll('div', attrs={"class": "f19xeu2d"}):
    pizzeriaObject = {
        "pizzeriaName": pizzeria.find('h1', attrs={"class": "f13p7rsj"}).text.encode('utf-8'),
        "address": pizzeria.find('address', attrs={"class": "f1lfckhr"}).text.encode('utf-8'),
        "phoneNumber": pizzeria.find('rc-c2d-number', attrs={"span": "rc-c2d-number"}).text.encode('utf-8'),

    }
    pizzaArray.append(pizzeriaObject)
with open('pizzeriaData.json', 'w') as outfile:
    json.dump(pizzaArray, outfile)
and

import requests
from bs4 import BeautifulSoup
import csv

with open('aliveSCRAPE.csv', newline='') as f_urls, open('output.csv', 'w', newline='') as f_output:
    csv_urls = csv.reader(f_urls)
    csv_output = csv.writer(f_output)
    csv_output.writerow(['locationRawData' , 'pizzeriaName' , 'address', 'Phone'])

    for line in csv_urls:
        r = requests.get(line[0]).text
        soup = BeautifulSoup(r.content, 'lxml')

        locationRawData = soup.find('h1')
        print('RAW :', locationRawData.text)

        pizzeriaName = soup.find('h1', class_='f13p7rsj').text
        pizzeria_name = pizzeria.split(':')
        print('pizzeriaName:', pizzeria_name[1])

        address = soup.find_all('address', class_='f1lfckhr'})
        print('Address :', address[2].text)

        phoneNumber = soup.find_all('button', class_='f12gt8lx')
        print('Phone :', phoneNumber[3].text)

        locationRawData = soup.find_all('div', class_='f19xeu2d'})
        print('RAW :', locationRawData[4].text)

        csv_output.writerow([locationRawData.text, pizzeria_name[1], address[2].text, phoneNumber[3].text])

And...a few other methods, which is the easiest? This is literally the first thing I have ever programmed in Python.

...\Desktop\scrapeYourPlate\test\Code>Python scrape.py
RAW : Bakers Buck Hut
Traceback (most recent call last):
  File "scrape.py", line 98, in <module>
    print('pizzeriaName:', pizzeriaName[1].text)
  File ...AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\bs4\element.py", line 1016, in __getitem__
    return self.attrs[key]
KeyError: 1
python python-3.x beautifulsoup



ERROR:


CodeNinjaGrasshopper
255 bronze badges
File "scrape.py", line 98, in <module> print('pizzeriaName:', pizzeriaName[1].text) File "C:\...\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\bs4\element.py", line 1016, in getitem return self.attrs[key] KeyError: 1

**Larz60+** · Jul-04-2019, 01:44 AM

This page has enough java script in it that I would get the first page using selenium, thes use beautiful soup to get the details
There are examples of this on this forum under tutorials/web scraping (by snippsat)

IanTheLMT · Jul-04-2019, 02:31 AM

(Jul-04-2019, 01:44 AM)Larz60+ Wrote: This page has enough java script in it that I would get the first page using selenium, thes use beautiful soup to get the details
There are examples of this on this forum under tutorials/web scraping (by snippsat)

Okay I will research that! Thank you for your reply! Right now I feel like I dont know what I dont know and I dont even know what to search for unles I am pointed in the right direction as you took the time to do! Thanks!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	What is the error of not being able to pull data in this code? i didn't see an error?	TestPerson	2	1,992	Sep-30-2022, 02:36 PM Last Post: DeaD_EyE
	BeautifulSoup not parsing other URLs	giddyhead	0	1,756	Feb-23-2022, 05:35 PM Last Post: giddyhead
	Extract Href URL and Text From List	knight2000	2	21,002	Jul-08-2021, 12:53 PM Last Post: knight2000
	Extract data from sports betting sites	nestor	3	7,627	Mar-30-2021, 04:37 PM Last Post: Larz60+
	Extract data from a table	Bob_M	3	3,468	Aug-14-2020, 03:36 PM Last Post: Bob_M
	Extract data with Selenium and BeautifulSoup	nestor	3	5,071	Jun-06-2020, 01:34 AM Last Post: Larz60+
	Parse a URL list stored in a CSV	paulfearn100	0	2,409	May-07-2020, 02:26 PM Last Post: paulfearn100
	Extract json-ld schema markup data and store in MongoDB	Nuwan16	0	3,064	Apr-05-2020, 04:06 PM Last Post: Nuwan16
	Extract data from a webpage	cycloneseb	5	3,944	Apr-04-2020, 10:17 AM Last Post: alekson
	Cannot Extract data through charts online	AgileAVS	0	2,351	Feb-01-2020, 01:47 PM Last Post: AgileAVS

Python/BeautiifulSoup. list of urls ->parse->extract data to csv. getting ERROR

User Panel Messages

Announcements