Sep-14-2020, 01:26 PM
I am getting image src in my python shell look like this:
image link:https://images-na.ssl-images-amazon.com/images/I/41oJQTxCbZL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/4152DCmmGFL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/41ayV4UraXL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/310z8LQ%2BoYL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gifBut I am getting image src in my csv file look like this:
<img alt="" src="https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif"/>where getting multiple image url for each item in python shell but in my csv file getting only one image url for each item with html tag. Product title, product price and product rating importing correctly in my csv but not getting all image url for each item. Here is an example of my final output which I am getting from python shell:
product_link: https://www.amazon.com/gp/slredirect/picassoRedirect.html/ref=pa_sp_btf_aps_sr_pg1_1?ie=UTF8&adId=A002532917E3JT34GS1DE&url=%2FWireless-Vssoplor-Portable-Computer-Computer-Black%2Fdp%2FB07RLYJJBX%2Fref%3Dsr_1_22_sspa%3Fcrid%3D22TI4BA3RLK5J%26dchild%3D1%26keywords%3Dwireless%2Bmouse%26qid%3D1599517835%26sprefix%3Dw%252Caps%252C528%26sr%3D8-22-spons%26psc%3D1&qualifier=1600050591&id=4126203954910776&widgetName=sp_btf product_title: Wireless Mouse, Vssoplor 2.4G Slim Portable Computer Mice with Nano Receiver for Notebook, PC, Laptop, Computer-Black and Sapphire Blue product_price: $10.99 product_rating: 2,262 ratings image link:https://images-na.ssl-images-amazon.com/images/I/41oJQTxCbZL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/4152DCmmGFL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/41ayV4UraXL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/I/310z8LQ%2BoYL._AC_US40_.jpg image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gifhere is my full code:
for page_num in range(1): url = "https://www.amazon.com/s?k=wireless+mouse&page={}&crid=22TI4BA3RLK5J&qid=1599517835&sprefix=w%2Caps%2C528&ref=sr_pg_2".format(page_num) r = requests.get(url,headers=headers,proxies=proxies,auth=auth).text soup = BeautifulSoup(r,'lxml') container = soup.find_all('h2',{'class':'a-size-mini a-spacing-none a-color-base s-line-clamp-2'}) for containers in container: product_link = f"https://www.amazon.com{containers.find('a')['href']}" #print(f"page_number:{url}\n\nproduct_link:{product_link}") #here I am start scraping from details page of each product details_page = requests.get(product_link,headers=headers,proxies=proxies,auth=auth).text dpsoup = BeautifulSoup(details_page,'lxml') title = dpsoup.find('span', id='productTitle') if title is not None: title = title.text.strip() else: title= None rating = dpsoup.find('span', id='acrCustomerReviewText') if rating is not None: rating = rating.text else: rating = None price = dpsoup.find('span', class_='a-size-mini twisterSwatchPrice') if price is not None: price = price.text else: price = None print(f'\nproduct_link: {product_link}\n\nproduct_title: {title}\n\nproduct_price: {price}\n\nproduct_rating: {rating}\n\n') #this is for scrape all gallray image src for url in dpsoup.select('span.a-button-text > img')[3:10]: print(f"image link:{url['src']}") with io.open("amazon.csv", "a",encoding="utf-8") as f: writeFile = csv.writer(f) writeFile.writerow([url,product_link ,title,rating,price])