not getting image src in my BeautifulSoup csv file - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: not getting image src in my BeautifulSoup csv file (/thread-29641.html) Pages:
1
2
|
not getting image src in my BeautifulSoup csv file - farhan275 - Sep-14-2020 I am getting image src in my python shell look like this: image link:https://images-na.ssl-images-amazon.com/images/I/41oJQTxCbZL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/4152DCmmGFL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/41ayV4UraXL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/310z8LQ%2BoYL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gifBut I am getting image src in my csv file look like this: <img alt="" src="https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif"/>where getting multiple image url for each item in python shell but in my csv file getting only one image url for each item with html tag. Product title, product price and product rating importing correctly in my csv but not getting all image url for each item. Here is an example of my final output which I am getting from python shell: product_link: https://www.amazon.com/gp/slredirect/picassoRedirect.html/ref=pa_sp_btf_aps_sr_pg1_1?ie=UTF8&adId=A002532917E3JT34GS1DE&url=%2FWireless-Vssoplor-Portable-Computer-Computer-Black%2Fdp%2FB07RLYJJBX%2Fref%3Dsr_1_22_sspa%3Fcrid%3D22TI4BA3RLK5J%26dchild%3D1%26keywords%3Dwireless%2Bmouse%26qid%3D1599517835%26sprefix%3Dw%252Caps%252C528%26sr%3D8-22-spons%26psc%3D1&qualifier=1600050591&id=4126203954910776&widgetName=sp_btf product_title: Wireless Mouse, Vssoplor 2.4G Slim Portable Computer Mice with Nano Receiver for Notebook, PC, Laptop, Computer-Black and Sapphire Blue product_price: $10.99 product_rating: 2,262 ratings image link:https://images-na.ssl-images-amazon.com/images/I/41oJQTxCbZL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/4152DCmmGFL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/41ayV4UraXL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/310z8LQ%2BoYL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gifhere is my full code: for page_num in range(1): url = "https://www.amazon.com/s?k=wireless+mouse&page={}&crid=22TI4BA3RLK5J&qid=1599517835&sprefix=w%2Caps%2C528&ref=sr_pg_2".format(page_num) r = requests.get(url,headers=headers,proxies=proxies,auth=auth).text soup = BeautifulSoup(r,'lxml') container = soup.find_all('h2',{'class':'a-size-mini a-spacing-none a-color-base s-line-clamp-2'}) for containers in container: product_link = f"https://www.amazon.com{containers.find('a')['href']}" #print(f"page_number:{url}\n\nproduct_link:{product_link}") #here I am start scraping from details page of each product details_page = requests.get(product_link,headers=headers,proxies=proxies,auth=auth).text dpsoup = BeautifulSoup(details_page,'lxml') title = dpsoup.find('span', id='productTitle') if title is not None: title = title.text.strip() else: title= None rating = dpsoup.find('span', id='acrCustomerReviewText') if rating is not None: rating = rating.text else: rating = None price = dpsoup.find('span', class_='a-size-mini twisterSwatchPrice') if price is not None: price = price.text else: price = None print(f'\nproduct_link: {product_link}\n\nproduct_title: {title}\n\nproduct_price: {price}\n\nproduct_rating: {rating}\n\n') #this is for scrape all gallray image src for url in dpsoup.select('span.a-button-text > img')[3:10]: print(f"image link:{url['src']}") with io.open("amazon.csv", "a",encoding="utf-8") as f: writeFile = csv.writer(f) writeFile.writerow([url,product_link ,title,rating,price]) RE: not getting image src in my BeautifulSoup csv file - buran - Sep-14-2020 looking at print(f"image link:{url['src']}")vs writeFile.writerow([url,product_link ,title,rating,price])don't you see the difference? url['src'] vs url Also, if you get just the last url, I think that in your actual code lines 36-38 are not inside the for loop. It's clear that this is not the code you run, because the indentation is wrong in multiple occasions - e.g. you have 4-space,3-space and 2-space indnetation per level RE: not getting image src in my BeautifulSoup csv file - farhan275 - Sep-14-2020 I also tried to use this but not working. Please give me an solution url = print(f"image link:{url['src']}") RE: not getting image src in my BeautifulSoup csv file - buran - Sep-14-2020 writeFile.writerow([url['src'],product_link ,title,rating,price])but again, you need to fix the indentation RE: not getting image src in my BeautifulSoup csv file - farhan275 - Sep-14-2020 now getting image url but getting only one image URL for each item in my csv. where I am getting multiple image url for each item in my shell . please see the indentation before my indentation was: #this is for scrape all gallray image src for url in dpsoup.select('span.a-button-text > img')[3:10]: print(f"image link:{url['src']}") with io.open("amazon.csv", "a",encoding="utf-8") as f: writeFile = csv.writer(f) writeFile.writerow([url['src'],product_link ,title,rating,price])now my indentation after your post: #this is for scrape all gallray image src for url in dpsoup.select('span.a-button-text > img')[3:10]: print(f"image link:{url['src']}") with io.open("amazon.csv", "a",encoding="utf-8") as f: writeFile = csv.writer(f) writeFile.writerow([url['src'],product_link ,title,rating,price]) RE: not getting image src in my BeautifulSoup csv file - buran - Sep-14-2020 (Sep-14-2020, 01:58 PM)farhan275 Wrote: now my indentation after your post:this is exactly the opposite of what I told you to do. Your "original" code had mixed indentation - some levels were 4 spaces, some levels were 2 spaces. So it was not the actual code you were running (or you would get IndentationError). your code should be for url in dpsoup.select('span.a-button-text > img')[3:10]: print(f"image link:{url['src']}") with io.open("amazon.csv", "a",encoding="utf-8") as f: writeFile = csv.writer(f) writeFile.writerow([url['src'],product_link ,title,rating,price])or even better, to avoid opening and closing file multiple times: with io.open("amazon.csv", "a",encoding="utf-8") as f: for url in dpsoup.select('span.a-button-text > img')[3:10]: print(f"image link:{url['src']}") writeFile = csv.writer(f) writeFile.writerow([url['src'],product_link ,title,rating,price]) RE: not getting image src in my BeautifulSoup csv file - farhan275 - Sep-14-2020 Still now same problem after fix the IndentationError. Getting only one image url for each item where multiple images url getting in my python shell for each item. here is my modified full code: for page_num in range(1): url = "https://www.amazon.com/s?k=wireless+mouse&page={}&crid=22TI4BA3RLK5J&qid=1599517835&sprefix=w%2Caps%2C528&ref=sr_pg_2".format(page_num) r = requests.get(url,headers=headers,proxies=proxies,auth=auth).text soup = BeautifulSoup(r,'lxml') container = soup.find_all('h2',{'class':'a-size-mini a-spacing-none a-color-base s-line-clamp-2'}) for containers in container: product_link = f"https://www.amazon.com{containers.find('a')['href']}" #print(f"page_number:{url}\n\nproduct_link:{product_link}") #here I am start scraping from details page of each product details_page = requests.get(product_link,headers=headers,proxies=proxies,auth=auth).text dpsoup = BeautifulSoup(details_page,'lxml') title = dpsoup.find('span', id='productTitle') if title is not None: title = title.text.strip() else: title= None rating = dpsoup.find('span', id='acrCustomerReviewText') if rating is not None: rating = rating.text else: rating = None price = dpsoup.find('span', class_='a-size-mini twisterSwatchPrice') if price is not None: price = price.text else: price = None print(f'\nproduct_link: {product_link}\n\nproduct_title: {title}\n\nproduct_price: {price}\n\nproduct_rating: {rating}\n\n') #this is for scrape all gallray image src with io.open("amazon.csv", "a",encoding="utf-8") as f: for url in dpsoup.select('span.a-button-text > img')[3:10]: print(f"image link:{url['src']}") writeFile = csv.writer(f) writeFile.writerow([url['src'],product_link ,title,rating,price]) RE: not getting image src in my BeautifulSoup csv file - buran - Sep-14-2020 make sure you saved the file after editing. what you say doesn't make sense. if it print all url (line 36) it will write the same to the file (line 38) and by the way, it doesn't affect your code, instead of io.open you can just use open .
RE: not getting image src in my BeautifulSoup csv file - farhan275 - Sep-14-2020 see my python shell here I am getting multiple images url for every product: product_link: https://www.amazon.com/Wireless-Uiosmuph-Rechargeable-Portable-Computer/dp/B082M9D31R/ref=sr_1_19?crid=22TI4BA3RLK5J&dchild=1&keywords=wireless+mouse&qid=1599517835&sprefix=w%2Caps%2C528&sr=8-19 product_title: LED Wireless Mouse, Uiosmuph G12 Slim Rechargeable Wireless Silent Mouse, 2.4G Portable USB Optical Wireless Computer Mice with USB Receiver and Type C Adapter (Black) product_price: $15.99 product_rating: 3,801 ratings image link:https://images-na.ssl-images-amazon.com/images/I/41JjzLoJGzL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/41bedZPDK4L._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/41JpcX8MKfL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/513b4p8dFbL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif product_link: https://www.amazon.com/VicTsing-Wireless-Receiver-Noiseless-Computer/dp/B073CDQ1Z4/ref=sr_1_20?crid=22TI4BA3RLK5J&dchild=1&keywords=wireless+mouse&qid=1599517835&sprefix=w%2Caps%2C528&sr=8-20 product_title: VicTsing [Upgraded] Slim Wireless Mouse, 2.4G Silent Laptop Mouse- Enjoy Noiseless Clicking, 1600DPI High Accuracy Portable Ergonomic Optical Wireless Mouse for Laptop, PC, Computer, Notebook, Mac product_price: $8.99 product_rating: 9,813 ratings image link:https://images-na.ssl-images-amazon.com/images/I/410U75vLdZL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/41DuollLgyL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/51XcrXNTWSL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/51m74igY8bL._SS40_BG85,85,85_BR-120_PKdp-play-icon-overlay__.jpg image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif product_link: https://www.amazon.com/Rii-Wireless-Rechargeable-Colorful-Mice-Black/dp/B07JHHTGFK/ref=sr_1_21?crid=22TI4BA3RLK5J&dchild=1&keywords=wireless+mouse&qid=1599517835&sprefix=w%2Caps%2C528&sr=8-21 product_title: Rii RM200 Wireless Mouse,2.4G Wireless Mouse 5 Buttons Rechargeable Mobile Optical Mouse with USB Nano Receiver,3 Adjustable DPI Levels,Colorful LED Lights for Notebook,PC,Computer-Black product_price: None product_rating: 4,071 ratings image link:https://images-na.ssl-images-amazon.com/images/I/41D3bPA0rjL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/41xycao2DvL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif product_link: https://www.amazon.com/gp/slredirect/picassoRedirect.html/ref=pa_sp_btf_aps_sr_pg1_1?ie=UTF8&adId=A05990032I8VXHPWQ12H5&url=%2FMemzuoix-Wireless-Portable-Receiver-Ergonomic%2Fdp%2FB07WCW1PB3%2Fref%3Dsr_1_22_sspa%3Fcrid%3D22TI4BA3RLK5J%26dchild%3D1%26keywords%3Dwireless%2Bmouse%26qid%3D1599517835%26sprefix%3Dw%252Caps%252C528%26sr%3D8-22-spons%26psc%3D1&qualifier=1600094980&id=1873951770229141&widgetName=sp_btf product_title: Memzuoix 2.4G Wireless Mouse, Portable Mobile Cordless Mouse with USB Receiver, 1,000 DPI Ergonomic Computer Wireless Mouse for Laptop, Desktop, Mac, PC, 5 Buttons, Red product_price: $12.99 product_rating: 609 ratings image link:https://images-na.ssl-images-amazon.com/images/I/61JiHsWLn6L._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/51v8m7XkV%2BL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/51S7CTnE5aL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/410fJq0LtLL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gifBut in my csv still now getting only one product url from each product RE: not getting image src in my BeautifulSoup csv file - buran - Sep-14-2020 You distinguish between img url and product url, do you? |