Python Forum
not getting image src in my BeautifulSoup csv file - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: not getting image src in my BeautifulSoup csv file (/thread-29641.html)

Pages: 1 2


not getting image src in my BeautifulSoup csv file - farhan275 - Sep-14-2020

I am getting image src in my python shell look like this:
image link:https://images-na.ssl-images-amazon.com/images/I/41oJQTxCbZL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/4152DCmmGFL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/41ayV4UraXL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/310z8LQ%2BoYL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif
But I am getting image src in my csv file look like this:
<img alt="" src="https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif"/> 
where getting multiple image url for each item in python shell but in my csv file getting only one image url for each item with html tag. Product title, product price and product rating importing correctly in my csv but not getting all image url for each item. Here is an example of my final output which I am getting from python shell:

product_link: https://www.amazon.com/gp/slredirect/picassoRedirect.html/ref=pa_sp_btf_aps_sr_pg1_1?ie=UTF8&adId=A002532917E3JT34GS1DE&url=%2FWireless-Vssoplor-Portable-Computer-Computer-Black%2Fdp%2FB07RLYJJBX%2Fref%3Dsr_1_22_sspa%3Fcrid%3D22TI4BA3RLK5J%26dchild%3D1%26keywords%3Dwireless%2Bmouse%26qid%3D1599517835%26sprefix%3Dw%252Caps%252C528%26sr%3D8-22-spons%26psc%3D1&qualifier=1600050591&id=4126203954910776&widgetName=sp_btf

product_title: Wireless Mouse, Vssoplor 2.4G Slim Portable Computer Mice with Nano Receiver for Notebook, PC, Laptop, Computer-Black and Sapphire Blue

product_price:  $10.99 

product_rating: 2,262 ratings


image link:https://images-na.ssl-images-amazon.com/images/I/41oJQTxCbZL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/4152DCmmGFL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/41ayV4UraXL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/310z8LQ%2BoYL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif  
here is my full code:
for page_num in range(1):
    url = "https://www.amazon.com/s?k=wireless+mouse&page={}&crid=22TI4BA3RLK5J&qid=1599517835&sprefix=w%2Caps%2C528&ref=sr_pg_2".format(page_num)
    r = requests.get(url,headers=headers,proxies=proxies,auth=auth).text
    soup = BeautifulSoup(r,'lxml')

    container = soup.find_all('h2',{'class':'a-size-mini a-spacing-none a-color-base s-line-clamp-2'})
    for containers in container:
        product_link = f"https://www.amazon.com{containers.find('a')['href']}"
        #print(f"page_number:{url}\n\nproduct_link:{product_link}")

        #here I am start scraping from details page of each product 
        details_page = requests.get(product_link,headers=headers,proxies=proxies,auth=auth).text
        dpsoup = BeautifulSoup(details_page,'lxml')

        
        title = dpsoup.find('span', id='productTitle')
        if title is not None:
          title = title.text.strip()
        else:
           title= None
        rating = dpsoup.find('span', id='acrCustomerReviewText')
        if rating is not None:
           rating = rating.text
        else:
           rating = None
        price = dpsoup.find('span', class_='a-size-mini twisterSwatchPrice')
        if price is not None:
           price = price.text
        else:
           price = None
        print(f'\nproduct_link: {product_link}\n\nproduct_title: {title}\n\nproduct_price: {price}\n\nproduct_rating: {rating}\n\n')

        #this is for scrape all gallray image src
        for url in dpsoup.select('span.a-button-text > img')[3:10]:
            print(f"image link:{url['src']}")
            with io.open("amazon.csv", "a",encoding="utf-8") as f:
              writeFile = csv.writer(f)
              writeFile.writerow([url,product_link ,title,rating,price]) 



RE: not getting image src in my BeautifulSoup csv file - buran - Sep-14-2020

looking at
print(f"image link:{url['src']}")
vs

writeFile.writerow([url,product_link ,title,rating,price])
don't you see the difference? url['src'] vs url

Also, if you get just the last url, I think that in your actual code lines 36-38 are not inside the for loop. It's clear that this is not the code you run, because the indentation is wrong in multiple occasions - e.g. you have 4-space,3-space and 2-space indnetation per level


RE: not getting image src in my BeautifulSoup csv file - farhan275 - Sep-14-2020

I also tried to use this but not working. Please give me an solution
url = print(f"image link:{url['src']}")



RE: not getting image src in my BeautifulSoup csv file - buran - Sep-14-2020

writeFile.writerow([url['src'],product_link ,title,rating,price])
but again, you need to fix the indentation


RE: not getting image src in my BeautifulSoup csv file - farhan275 - Sep-14-2020

now getting image url but getting only one image URL for each item in my csv. where I am getting multiple image url for each item in my shell . please see the indentation

before my indentation was:
#this is for scrape all gallray image src
        for url in dpsoup.select('span.a-button-text > img')[3:10]:
            print(f"image link:{url['src']}")
            with io.open("amazon.csv", "a",encoding="utf-8") as f:
              writeFile = csv.writer(f)
              writeFile.writerow([url['src'],product_link ,title,rating,price]) 
now my indentation after your post:
   #this is for scrape all gallray image src
        for url in dpsoup.select('span.a-button-text > img')[3:10]:
            print(f"image link:{url['src']}")
        with io.open("amazon.csv", "a",encoding="utf-8") as f:
             writeFile = csv.writer(f)
             writeFile.writerow([url['src'],product_link ,title,rating,price]) 



RE: not getting image src in my BeautifulSoup csv file - buran - Sep-14-2020

(Sep-14-2020, 01:58 PM)farhan275 Wrote: now my indentation after your post:
this is exactly the opposite of what I told you to do. Your "original" code had mixed indentation - some levels were 4 spaces, some levels were 2 spaces. So it was not the actual code you were running (or you would get IndentationError).
your code should be
for url in dpsoup.select('span.a-button-text > img')[3:10]:
    print(f"image link:{url['src']}")
    with io.open("amazon.csv", "a",encoding="utf-8") as f:
         writeFile = csv.writer(f)
         writeFile.writerow([url['src'],product_link ,title,rating,price])
or even better, to avoid opening and closing file multiple times:
with io.open("amazon.csv", "a",encoding="utf-8") as f:
    for url in dpsoup.select('span.a-button-text > img')[3:10]:
        print(f"image link:{url['src']}")
        writeFile = csv.writer(f)
        writeFile.writerow([url['src'],product_link ,title,rating,price])



RE: not getting image src in my BeautifulSoup csv file - farhan275 - Sep-14-2020

Still now same problem after fix the IndentationError. Getting only one image url for each item where multiple images url getting in my python shell for each item. here is my modified full code:


for page_num in range(1):
    url = "https://www.amazon.com/s?k=wireless+mouse&page={}&crid=22TI4BA3RLK5J&qid=1599517835&sprefix=w%2Caps%2C528&ref=sr_pg_2".format(page_num)
    r = requests.get(url,headers=headers,proxies=proxies,auth=auth).text
    soup = BeautifulSoup(r,'lxml')
    container = soup.find_all('h2',{'class':'a-size-mini a-spacing-none a-color-base s-line-clamp-2'})
    for containers in container:
        product_link = f"https://www.amazon.com{containers.find('a')['href']}"
        #print(f"page_number:{url}\n\nproduct_link:{product_link}")
        #here I am start scraping from details page of each product 
        details_page = requests.get(product_link,headers=headers,proxies=proxies,auth=auth).text
        dpsoup = BeautifulSoup(details_page,'lxml')
        title = dpsoup.find('span', id='productTitle')
        if title is not None:
          title = title.text.strip()
        else:
           title= None
        rating = dpsoup.find('span', id='acrCustomerReviewText')
        if rating is not None:
           rating = rating.text
        else:
           rating = None
        price = dpsoup.find('span', class_='a-size-mini twisterSwatchPrice')
        if price is not None:
           price = price.text
        else:
           price = None
        print(f'\nproduct_link: {product_link}\n\nproduct_title: {title}\n\nproduct_price: {price}\n\nproduct_rating: {rating}\n\n')
        #this is for scrape all gallray image src
        with io.open("amazon.csv", "a",encoding="utf-8") as f:
           for url in dpsoup.select('span.a-button-text > img')[3:10]:
              print(f"image link:{url['src']}")
              writeFile = csv.writer(f)
              writeFile.writerow([url['src'],product_link ,title,rating,price]) 
                

        



RE: not getting image src in my BeautifulSoup csv file - buran - Sep-14-2020

make sure you saved the file after editing. what you say doesn't make sense. if it print all url (line 36) it will write the same to the file (line 38)
and by the way, it doesn't affect your code, instead of io.open you can just use open.


RE: not getting image src in my BeautifulSoup csv file - farhan275 - Sep-14-2020

see my python shell here I am getting multiple images url for every product:

product_link: https://www.amazon.com/Wireless-Uiosmuph-Rechargeable-Portable-Computer/dp/B082M9D31R/ref=sr_1_19?crid=22TI4BA3RLK5J&dchild=1&keywords=wireless+mouse&qid=1599517835&sprefix=w%2Caps%2C528&sr=8-19

product_title: LED Wireless Mouse, Uiosmuph G12 Slim Rechargeable Wireless Silent Mouse, 2.4G Portable USB Optical Wireless Computer Mice with USB Receiver and Type C Adapter (Black)

product_price:  $15.99 

product_rating: 3,801 ratings


image link:https://images-na.ssl-images-amazon.com/images/I/41JjzLoJGzL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/41bedZPDK4L._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/41JpcX8MKfL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/513b4p8dFbL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif

product_link: https://www.amazon.com/VicTsing-Wireless-Receiver-Noiseless-Computer/dp/B073CDQ1Z4/ref=sr_1_20?crid=22TI4BA3RLK5J&dchild=1&keywords=wireless+mouse&qid=1599517835&sprefix=w%2Caps%2C528&sr=8-20

product_title: VicTsing [Upgraded] Slim Wireless Mouse, 2.4G Silent Laptop Mouse- Enjoy Noiseless Clicking, 1600DPI High Accuracy Portable Ergonomic Optical Wireless Mouse for Laptop, PC, Computer, Notebook, Mac

product_price:  $8.99 

product_rating: 9,813 ratings


image link:https://images-na.ssl-images-amazon.com/images/I/410U75vLdZL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/41DuollLgyL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/51XcrXNTWSL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/51m74igY8bL._SS40_BG85,85,85_BR-120_PKdp-play-icon-overlay__.jpg
image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif

product_link: https://www.amazon.com/Rii-Wireless-Rechargeable-Colorful-Mice-Black/dp/B07JHHTGFK/ref=sr_1_21?crid=22TI4BA3RLK5J&dchild=1&keywords=wireless+mouse&qid=1599517835&sprefix=w%2Caps%2C528&sr=8-21

product_title: Rii RM200 Wireless Mouse,2.4G Wireless Mouse 5 Buttons Rechargeable Mobile Optical Mouse with USB Nano Receiver,3 Adjustable DPI Levels,Colorful LED Lights for Notebook,PC,Computer-Black

product_price: None

product_rating: 4,071 ratings


image link:https://images-na.ssl-images-amazon.com/images/I/41D3bPA0rjL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/41xycao2DvL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif

product_link: https://www.amazon.com/gp/slredirect/picassoRedirect.html/ref=pa_sp_btf_aps_sr_pg1_1?ie=UTF8&adId=A05990032I8VXHPWQ12H5&url=%2FMemzuoix-Wireless-Portable-Receiver-Ergonomic%2Fdp%2FB07WCW1PB3%2Fref%3Dsr_1_22_sspa%3Fcrid%3D22TI4BA3RLK5J%26dchild%3D1%26keywords%3Dwireless%2Bmouse%26qid%3D1599517835%26sprefix%3Dw%252Caps%252C528%26sr%3D8-22-spons%26psc%3D1&qualifier=1600094980&id=1873951770229141&widgetName=sp_btf

product_title: Memzuoix 2.4G Wireless Mouse, Portable Mobile Cordless Mouse with USB Receiver, 1,000 DPI Ergonomic Computer Wireless Mouse for Laptop, Desktop, Mac, PC, 5 Buttons, Red

product_price:  $12.99 

product_rating: 609 ratings


image link:https://images-na.ssl-images-amazon.com/images/I/61JiHsWLn6L._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/51v8m7XkV%2BL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/51S7CTnE5aL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/410fJq0LtLL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif
But in my csv still now getting only one product url from each product


RE: not getting image src in my BeautifulSoup csv file - buran - Sep-14-2020

You distinguish between img url and product url, do you?