Python Forum

Full Version: error in code web scraping
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi all,
I tried to do web scraping with webdriver from selenium (chromedriver) and BeautifulSoup.
Here I show you my code:
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
driver = webdriver.Chrome("/usr/local/bin/chromedriver")
products=[] #List to store name of the product
prices=[] #List to store price of the product
ratings=[] #List to store rating of the product
driver.get("https://www.flipkart.com/search?q=laptop&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=off&as=off")
content = driver.page_source
soup = BeautifulSoup(content)
for a in soup.findAll('a',href=True, attrs={'class':'_31qSD5'}):
    name=a.find('div', attrs={'class':'_3wU53n'})
    price=a.find('div', attrs={'class':'_1vC4OE _2rQ-NK'})
    rating=a.find('div', attrs={'class':'hGSR34 _2beYZw'})
products.append(name.text)
prices.append(price.text)
ratings.append(rating.text) #line 25
df = pd.DataFrame({'Product Name':products,'Price':prices,'Rating':ratings}) 
df.to_csv('products.csv', index=False, encoding='utf-8')
But I obtain this error:
Error:
AttributeError: 'NoneType' object has no attribute 'text'
I don't understand why.
Which attribute I have to put in the append function?

Best regards.
Alexis
It means that one or more of the elements that you are trying to find here (not sure from the incomplete error traceback):
    name=a.find('div', attrs={'class':'_3wU53n'})
    price=a.find('div', attrs={'class':'_1vC4OE _2rQ-NK'})
    rating=a.find('div', attrs={'class':'hGSR34 _2beYZw'})
do not exist -> meaning they are NoneType, and then when you are trying to access attribute text you get the error.

You should probably double check the class names used.
Hi mlieqo,
Thank you for your answer.
I tried the code without the attribute "text":
products.append(name)
prices.append(price)
ratings.append(rating) 
I haven't anymore the error, but i don't obtain nothing in the file products.csv

Do you think I have to put this code
products.append(name)
prices.append(price)
ratings.append(rating) 
in the for loop?

Yours.
Alexis
I think you've misunderstood what you've been told. find will return some object and you'll need to use the text attribute to, well, get the text from it. In your case, find is returning None as an element matching the criteria you specified (e.g. a "div" element with a class of "_3wU53n" could not be found. So, None is going to be assigned to your variable and then when you try and access attributes or call methods on that, the same kind of problem will occur.

As mlieqo suggests, you should check the criteria you're using to find the elements: are you using the right tag and class names?
As mention you most check better that name of class is correct,it is only hGSR34 and not hGSR34 _2beYZw.
Also not all product has a rating so need a fix for this.
Most specify which parser that BS shall use or get a Warning message,preferably use lxml.
As you have loop move append lines into the loop,or only get first product in the lists.
soup = BeautifulSoup(browser.page_source, 'lxml')
products = [] #List to store name of the product
prices = [] #List to store price of the product
ratings = [] #List to store rating of the product
browser.get("https://www.flipkart.com/search?q=laptop&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=off&as=off")
content = browser.page_source
soup = BeautifulSoup(content, 'lxml')
for a in soup.findAll('a', href=True, class_="_31qSD5"):
    name = a.find('div', class_="_3wU53n")
    price = a.find('div', class_="_1vC4OE _2rQ-NK")
    rating = a.find('div',class_="hGSR34")
    products.append(name.text)
    prices.append(price.text)
    try:
        ratings.append(rating.text)
    except AttributeError:
        pass
[>>> ratings
['4.2',
 '4.4',
 '4.5',
 '4.3',
 '4.4',
 '4',
 '4.2',
 '3.7',
 '4.6',
 '4.4',
 '4.5',
 '4',
 '4.5',
 '4.4',
 '4.7',
 '4',
 '4.5',
 '4.6',
 '4.2',
 '4.6',
 '4.3',
 '4.3',
 '5']
>>> prices
['₹39,990',
 '₹1,20,990',
 '₹35,990',
 '₹56,990',
 '₹52,990',
 '₹59,990',
 '₹32,990',
 '₹39,990',
 '₹43,990',
 '₹52,990',
 '₹59,990',
 '₹60,990',
 '₹35,990',
 '₹39,990',
 '₹38,990',
 '₹73,990',
 '₹24,990',
 '₹55,990',
 '₹69,990',
 '₹1,01,990',
 '₹59,990',
 '₹54,990',
 '₹61,990',
 '₹75,990']
Hi all,
Thank you for yours answers.
That is much clearer for me now.
It is right, I was wrong with the tag of rating.
Thank you snippsat for your solution.
I just have to modify a bit the code because in the code:
for a in soup.findAll('a',href=True, attrs={'class':'_31qSD5'}):
    name=a.find('div', attrs={'class':'_3wU53n'})
    price=a.find('div', attrs={'class':'_1vC4OE _2rQ-NK'})
    rating=a.find('div', attrs={'class':'hGSR34'})
    products.append(name.text)
    prices.append(price.text)
    try:
        ratings.append(rating.text)
    except AttributeError:
        pass 
df = pd.DataFrame({'Product Name':products,'Price':prices,'Rating':ratings}) 
df.to_csv('products.csv', index=False, encoding='utf-8')
if i run with "pass", i can't create my CSV file after because the arrays haven't the same length
So i can replace "pass" by a None value, like that:
for a in soup.findAll('a',href=True, attrs={'class':'_31qSD5'}):
    name=a.find('div', attrs={'class':'_3wU53n'})
    price=a.find('div', attrs={'class':'_1vC4OE _2rQ-NK'})
    rating=a.find('div', attrs={'class':'hGSR34'})
    products.append(name.text)
    prices.append(price.text)
    try:
        ratings.append(rating.text)
    except AttributeError:
        None 
df = pd.DataFrame({'Product Name':products,'Price':prices,'Rating':ratings}) 
df.to_csv('products.csv', index=False, encoding='utf-8')
Best regards.

Alexis