Python Forum
Python/BeautiifulSoup. list of urls ->parse->extract data to csv. getting ERROR
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python/BeautiifulSoup. list of urls ->parse->extract data to csv. getting ERROR
#1
I have a list of urls in a csv file (I can either host said file on my local machine or online). I need to pull biz name, address, and phone # from the web pages in the list. I have all of the correct class names. I want to extract this data to a csv with the aforementioned columns.

From the csv:

https://slicelife.com/restaurants/wi/mil...aukee/menu
https://slicelife.com/restaurants/nj/nor...hvale/menu
https://slicelife.com/restaurants/mn/man...pizza/menu
https://slicelife.com/restaurants/pa/new...k-hut/menu


When I run the code, it will create a csv with the desired column headers, but no data due to errors. I CAN pull data from the scraped urls one at a time like this:


# locationRawData = soup.find('div', attrs={"class": "f19xeu2d"}).text.encode('utf-8'), 
# pizzeriaName = soup.find('h1', attrs={"class": "f13p7rsj"}).text.encode('utf-8'),
# address = soup.find('address', attrs={"class": "f1lfckhr"}).text.encode('utf-8'),
# phoneNumber = soup.find('button', attrs={"class": "f12gt8lx"}).text.encode('utf-8'),
I have tried:

from bs4 import BeautifulSoup
import requests
import json
import csv
from urllib.request import urlopen


TrattoriArray = []
with open('aliveSlice.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        TrattoriArray.append(url) # Add each url to list contents

for url in TrattoriArray:  # Parse through each url in the list.
    page = urlopen(url[0]).read()
    content = BeautifulSoup(page.content, "html.parser")

pizzaArray = []
for pizzeria in content.findAll('div', attrs={"class": "f19xeu2d"}):
    pizzeriaObject = {
        "pizzeriaName": pizzeria.find('h1', attrs={"class": "f13p7rsj"}).text.encode('utf-8'),
        "address": pizzeria.find('address', attrs={"class": "f1lfckhr"}).text.encode('utf-8'),
        "phoneNumber": pizzeria.find('rc-c2d-number', attrs={"span": "rc-c2d-number"}).text.encode('utf-8'),

    }
    pizzaArray.append(pizzeriaObject)
with open('pizzeriaData.json', 'w') as outfile:
    json.dump(pizzaArray, outfile)
and

import requests
from bs4 import BeautifulSoup
import csv

with open('aliveSCRAPE.csv', newline='') as f_urls, open('output.csv', 'w', newline='') as f_output:
    csv_urls = csv.reader(f_urls)
    csv_output = csv.writer(f_output)
    csv_output.writerow(['locationRawData' , 'pizzeriaName' , 'address', 'Phone'])

    for line in csv_urls:
        r = requests.get(line[0]).text
        soup = BeautifulSoup(r.content, 'lxml')

        locationRawData = soup.find('h1')
        print('RAW :', locationRawData.text)

        pizzeriaName = soup.find('h1', class_='f13p7rsj').text
        pizzeria_name = pizzeria.split(':')
        print('pizzeriaName:', pizzeria_name[1])

        address = soup.find_all('address', class_='f1lfckhr'})
        print('Address :', address[2].text)

        phoneNumber = soup.find_all('button', class_='f12gt8lx')
        print('Phone :', phoneNumber[3].text)

        locationRawData = soup.find_all('div', class_='f19xeu2d'})
        print('RAW :', locationRawData[4].text)

        csv_output.writerow([locationRawData.text, pizzeria_name[1], address[2].text, phoneNumber[3].text])
And...a few other methods, which is the easiest? This is literally the first thing I have ever programmed in Python.
...\Desktop\scrapeYourPlate\test\Code>Python scrape.py
RAW : Bakers Buck Hut
Traceback (most recent call last):
  File "scrape.py", line 98, in <module>
    print('pizzeriaName:', pizzeriaName[1].text)
  File ...AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\bs4\element.py", line 1016, in __getitem__
    return self.attrs[key]
KeyError: 1
python python-3.x beautifulsoup



ERROR:


CodeNinjaGrasshopper
255 bronze badges
File "scrape.py", line 98, in <module> print('pizzeriaName:', pizzeriaName[1].text) File "C:\...\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\bs4\element.py", line 1016, in getitem return self.attrs[key] KeyError: 1 
Reply


Messages In This Thread
Python/BeautiifulSoup. list of urls ->parse->extract data to csv. getting ERROR - by IanTheLMT - Jul-03-2019, 01:52 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  What is the error of not being able to pull data in this code? i didn't see an error? TestPerson 2 1,183 Sep-30-2022, 02:36 PM
Last Post: DeaD_EyE
  BeautifulSoup not parsing other URLs giddyhead 0 1,168 Feb-23-2022, 05:35 PM
Last Post: giddyhead
  Extract Href URL and Text From List knight2000 2 8,625 Jul-08-2021, 12:53 PM
Last Post: knight2000
  Extract data from sports betting sites nestor 3 5,549 Mar-30-2021, 04:37 PM
Last Post: Larz60+
  Extract data from a table Bob_M 3 2,627 Aug-14-2020, 03:36 PM
Last Post: Bob_M
  Need logic on how to scrap 100K URLs goodmind 2 2,569 Jun-29-2020, 09:53 AM
Last Post: goodmind
  Extract data with Selenium and BeautifulSoup nestor 3 3,816 Jun-06-2020, 01:34 AM
Last Post: Larz60+
  Parse a URL list stored in a CSV paulfearn100 0 1,663 May-07-2020, 02:26 PM
Last Post: paulfearn100
  Extract json-ld schema markup data and store in MongoDB Nuwan16 0 2,412 Apr-05-2020, 04:06 PM
Last Post: Nuwan16
  Extract data from a webpage cycloneseb 5 2,818 Apr-04-2020, 10:17 AM
Last Post: alekson

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020