Feb-03-2021, 10:04 PM
#!/usr/bin/env python # coding: utf-8 # In[ ]: import bs4 from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup from urllib.request import urlopen as uReq #Web Client # In[ ]: page_url = 'https://www.newegg.com/p/pl?d=graphics+cards' uReq(page_url) # In[ ]: # opens the connection and downloads html page from url uClient = uReq(page_url) # In[ ]: # parses html into a soup data structure to traverse html # as if it were a json type. page_soup = soup(uClient.read(), 'html.parser') uClient.close() # In[ ]: # finds each product from the store page containers = page_soup.find("div", {"class": "item-container"}) # In[ ]: # name the output file to write to local disk out_filename = "graphics_cards.csv" # header of csv file to be written headers = "brand,product_name,shipping \n" # In[ ]: #opens file and writes headers f = open(out_filename, "w") f.write(headers) # In[ ]: # loops over each product and grabs attributes about # each product for container in containers: # Finds all link tags "a" from within the first div. make_rating_sp = containers.div.select("a") # In[ ]: # Grabs the title from the title attribute # the does proper casing using .title() brand = make_rating_sp[0].img["title"].title() # In[ ]: # Grabs the text with the second "(a)" tag from within # a list of queries product_name = container.div.select("a")[2].text # In[ ]: # Grabs the product shipping information by searching # all lists wi th the class "price-ship". # Then cleans th text of white space with strip() # Cleans the strip of "Shipping $" if it exists to just get number shipping = container.findAll('li', {"class: price]-ship"})[0]/yext.strip().replace("$", "").replace(" Shipping". "") # In[ ]: # prints to the dataset to console print('brand: " + brand "\n') print("product_name" product_name + "\n") print("shipping: " + shipping + "\n") # In[ ]: # writes the dataset to file f.write(brand + ", " + product_name.replace(", ", "|") + shipping + "\n") # In[ ]: f.close() # Close the fileIf run on python 3.8 or higher gives an error. I believe that I understand the error, but I
do not understand the correction.
Here is the error and here is the correction suggested.
Error:AttributeError Traceback (most recent call last)
<ipython-input-8-2e96067d0256> in <module>()
3 for container in containers:
4 # Finds all link tags "a" from within the first div.
----> 5 make_rating_sp = containers.div.select("a")
6
C:\Users\james\miniconda3\envs\py-dep\lib\site-packages\bs4\element.py in __getattr__(self, key)
2172 """Raise a helpful exception to explain a common code fix."""
2173 raise AttributeError(
-> 2174 "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key
2175 )
AttributeError: ResultSet object has no attribute 'div'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
In fact I will post the correction now:Quote:MINOR SUGGESTION
As of 10/03/2019, If you are following along this tutorial. "container.div" won't give you the div with the "item-info" class. Instead it will give you the div with the "item-badges" class. This is because the latter occurs before the former. When you access any tag with the dot(.) operator, it will just return the first instance of that tag. I had a problem following this along until i figured this out. To solve this just use the "find()" method to find exactly the div which contains the information that you want. For e.g. divWithInfo = containers[0].find("div","item-info")
The link for that can be found here: https://www.youtube.com/watch?v=XQgXKtPSzUI&t=1725s
I do not know what to do with the snippet: divWithInfo = containers[0].find('div',"item-info")
If one were to look at all of the replies, then it would that there are several suggestions; but which one will work?
I cannot get the code box to work so I decided to post the error anyway and edit it later.
Any help appreciated. Thanks.
Respectfully,
LZ