Sep-14-2020, 01:26 PM
I am getting image src in my python shell look like this:
But I am getting image src in my csv file look like this:
where getting multiple image url for each item in python shell but in my csv file getting only one image url for each item with html tag. Product title, product price and product rating importing correctly in my csv but not getting all image url for each item. Here is an example of my final output which I am getting from python shell:
here is my full code:
1 2 3 4 5 |
image link:https: / / images - na.ssl - images - / images / I / 41oJQTxCbZL ._AC_US40_.jpg image link:https: / / images - na.ssl - images - / images / I / 4152DCmmGFL ._AC_US40_.jpg image link:https: / / images - na.ssl - images - / images / I / 41ayV4UraXL ._AC_US40_.jpg image link:https: / / images - na.ssl - images - / images / I / 310z8LQ % 2BoYL ._AC_US40_.jpg image link:https: / / images - na.ssl - images - / images / G / 01 / x - locale / common / transparent - pixel._V192234675_.gif |
1 |
<img alt = " " src=" https: / / images - na.ssl - images - / images / G / 01 / x - locale / common / transparent - f " / > |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
product_link: https: / / / gp / slredirect / picassoRedirect.html / ref = pa_sp_btf_aps_sr_pg1_1?ie = UTF8&adId = A002532917E3JT34GS1DE&url = % 2FWireless - Vssoplor - Portable - Computer - Computer - Black % 2Fdp % 2FB07RLYJJBX % 2Fref % 3Dsr_1_22_sspa % 3Fcrid % 3D22TI4BA3RLK5J % 26dchild % 3D1 % 26keywords % 3Dwireless % 2Bmouse % 26qid % 3D1599517835 % 26sprefix % 3Dw % 252Caps % 252C528 % 26sr % 3D8 - 22 - spons % 26psc % 3D1 &qualifier = 1600050591 & id = 4126203954910776 &widgetName = sp_btf product_title: Wireless Mouse, Vssoplor 2.4G Slim Portable Computer Mice with Nano Receiver for Notebook, PC, Laptop, Computer - Black and Sapphire Blue product_price: $ 10.99 product_rating: 2 , 262 ratings image link:https: / / images - na.ssl - images - / images / I / 41oJQTxCbZL ._AC_US40_.jpg image link:https: / / images - na.ssl - images - / images / I / 4152DCmmGFL ._AC_US40_.jpg image link:https: / / images - na.ssl - images - / images / I / 41ayV4UraXL ._AC_US40_.jpg image link:https: / / images - na.ssl - images - / images / I / 310z8LQ % 2BoYL ._AC_US40_.jpg image link:https: / / images - na.ssl - images - / images / G / 01 / x - locale / common / transparent - pixel._V192234675_.gif |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
for page_num in range ( 1 ): url = "{}&crid=22TI4BA3RLK5J&qid=1599517835&sprefix=w%2Caps%2C528&ref=sr_pg_2" . format (page_num) r = requests.get(url,headers = headers,proxies = proxies,auth = auth).text soup = BeautifulSoup(r, 'lxml' ) container = soup.find_all( 'h2' ,{ 'class' : 'a-size-mini a-spacing-none a-color-base s-line-clamp-2' }) for containers in container: #print(f"page_number:{url}\n\nproduct_link:{product_link}") #here I am start scraping from details page of each product details_page = requests.get(product_link,headers = headers,proxies = proxies,auth = auth).text dpsoup = BeautifulSoup(details_page, 'lxml' ) title = dpsoup.find( 'span' , id = 'productTitle' ) if title is not None : title = title.text.strip() else : title = None rating = dpsoup.find( 'span' , id = 'acrCustomerReviewText' ) if rating is not None : rating = rating.text else : rating = None price = dpsoup.find( 'span' , class_ = 'a-size-mini twisterSwatchPrice' ) if price is not None : price = price.text else : price = None print ( f '\nproduct_link: {product_link}\n\nproduct_title: {title}\n\nproduct_price: {price}\n\nproduct_rating: {rating}\n\n' ) #this is for scrape all gallray image src for url in 'span.a-button-text > img' )[ 3 : 10 ]: print ( f "image link:{url['src']}" ) with io. open ( "amazon.csv" , "a" ,encoding = "utf-8" ) as f: writeFile = csv.writer(f) writeFile.writerow([url,product_link ,title,rating,price]) |