Python Forum
not getting image src in my BeautifulSoup csv file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
not getting image src in my BeautifulSoup csv file
#1
I am getting image src in my python shell look like this:
image link:https://images-na.ssl-images-amazon.com/images/I/41oJQTxCbZL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/4152DCmmGFL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/41ayV4UraXL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/310z8LQ%2BoYL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif
But I am getting image src in my csv file look like this:
<img alt="" src="https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif"/> 
where getting multiple image url for each item in python shell but in my csv file getting only one image url for each item with html tag. Product title, product price and product rating importing correctly in my csv but not getting all image url for each item. Here is an example of my final output which I am getting from python shell:

product_link: https://www.amazon.com/gp/slredirect/picassoRedirect.html/ref=pa_sp_btf_aps_sr_pg1_1?ie=UTF8&adId=A002532917E3JT34GS1DE&url=%2FWireless-Vssoplor-Portable-Computer-Computer-Black%2Fdp%2FB07RLYJJBX%2Fref%3Dsr_1_22_sspa%3Fcrid%3D22TI4BA3RLK5J%26dchild%3D1%26keywords%3Dwireless%2Bmouse%26qid%3D1599517835%26sprefix%3Dw%252Caps%252C528%26sr%3D8-22-spons%26psc%3D1&qualifier=1600050591&id=4126203954910776&widgetName=sp_btf

product_title: Wireless Mouse, Vssoplor 2.4G Slim Portable Computer Mice with Nano Receiver for Notebook, PC, Laptop, Computer-Black and Sapphire Blue

product_price:  $10.99 

product_rating: 2,262 ratings


image link:https://images-na.ssl-images-amazon.com/images/I/41oJQTxCbZL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/4152DCmmGFL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/41ayV4UraXL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/310z8LQ%2BoYL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif  
here is my full code:
for page_num in range(1):
    url = "https://www.amazon.com/s?k=wireless+mouse&page={}&crid=22TI4BA3RLK5J&qid=1599517835&sprefix=w%2Caps%2C528&ref=sr_pg_2".format(page_num)
    r = requests.get(url,headers=headers,proxies=proxies,auth=auth).text
    soup = BeautifulSoup(r,'lxml')

    container = soup.find_all('h2',{'class':'a-size-mini a-spacing-none a-color-base s-line-clamp-2'})
    for containers in container:
        product_link = f"https://www.amazon.com{containers.find('a')['href']}"
        #print(f"page_number:{url}\n\nproduct_link:{product_link}")

        #here I am start scraping from details page of each product 
        details_page = requests.get(product_link,headers=headers,proxies=proxies,auth=auth).text
        dpsoup = BeautifulSoup(details_page,'lxml')

        
        title = dpsoup.find('span', id='productTitle')
        if title is not None:
          title = title.text.strip()
        else:
           title= None
        rating = dpsoup.find('span', id='acrCustomerReviewText')
        if rating is not None:
           rating = rating.text
        else:
           rating = None
        price = dpsoup.find('span', class_='a-size-mini twisterSwatchPrice')
        if price is not None:
           price = price.text
        else:
           price = None
        print(f'\nproduct_link: {product_link}\n\nproduct_title: {title}\n\nproduct_price: {price}\n\nproduct_rating: {rating}\n\n')

        #this is for scrape all gallray image src
        for url in dpsoup.select('span.a-button-text > img')[3:10]:
            print(f"image link:{url['src']}")
            with io.open("amazon.csv", "a",encoding="utf-8") as f:
              writeFile = csv.writer(f)
              writeFile.writerow([url,product_link ,title,rating,price]) 
Reply
#2
looking at
print(f"image link:{url['src']}")
vs

writeFile.writerow([url,product_link ,title,rating,price])
don't you see the difference? url['src'] vs url

Also, if you get just the last url, I think that in your actual code lines 36-38 are not inside the for loop. It's clear that this is not the code you run, because the indentation is wrong in multiple occasions - e.g. you have 4-space,3-space and 2-space indnetation per level
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
I also tried to use this but not working. Please give me an solution
url = print(f"image link:{url['src']}")
Reply
#4
writeFile.writerow([url['src'],product_link ,title,rating,price])
but again, you need to fix the indentation
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#5
now getting image url but getting only one image URL for each item in my csv. where I am getting multiple image url for each item in my shell . please see the indentation

before my indentation was:
#this is for scrape all gallray image src
        for url in dpsoup.select('span.a-button-text > img')[3:10]:
            print(f"image link:{url['src']}")
            with io.open("amazon.csv", "a",encoding="utf-8") as f:
              writeFile = csv.writer(f)
              writeFile.writerow([url['src'],product_link ,title,rating,price]) 
now my indentation after your post:
   #this is for scrape all gallray image src
        for url in dpsoup.select('span.a-button-text > img')[3:10]:
            print(f"image link:{url['src']}")
        with io.open("amazon.csv", "a",encoding="utf-8") as f:
             writeFile = csv.writer(f)
             writeFile.writerow([url['src'],product_link ,title,rating,price]) 
Reply
#6
(Sep-14-2020, 01:58 PM)farhan275 Wrote: now my indentation after your post:
this is exactly the opposite of what I told you to do. Your "original" code had mixed indentation - some levels were 4 spaces, some levels were 2 spaces. So it was not the actual code you were running (or you would get IndentationError).
your code should be
for url in dpsoup.select('span.a-button-text > img')[3:10]:
    print(f"image link:{url['src']}")
    with io.open("amazon.csv", "a",encoding="utf-8") as f:
         writeFile = csv.writer(f)
         writeFile.writerow([url['src'],product_link ,title,rating,price])
or even better, to avoid opening and closing file multiple times:
with io.open("amazon.csv", "a",encoding="utf-8") as f:
    for url in dpsoup.select('span.a-button-text > img')[3:10]:
        print(f"image link:{url['src']}")
        writeFile = csv.writer(f)
        writeFile.writerow([url['src'],product_link ,title,rating,price])
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#7
Still now same problem after fix the IndentationError. Getting only one image url for each item where multiple images url getting in my python shell for each item. here is my modified full code:


for page_num in range(1):
    url = "https://www.amazon.com/s?k=wireless+mouse&page={}&crid=22TI4BA3RLK5J&qid=1599517835&sprefix=w%2Caps%2C528&ref=sr_pg_2".format(page_num)
    r = requests.get(url,headers=headers,proxies=proxies,auth=auth).text
    soup = BeautifulSoup(r,'lxml')
    container = soup.find_all('h2',{'class':'a-size-mini a-spacing-none a-color-base s-line-clamp-2'})
    for containers in container:
        product_link = f"https://www.amazon.com{containers.find('a')['href']}"
        #print(f"page_number:{url}\n\nproduct_link:{product_link}")
        #here I am start scraping from details page of each product 
        details_page = requests.get(product_link,headers=headers,proxies=proxies,auth=auth).text
        dpsoup = BeautifulSoup(details_page,'lxml')
        title = dpsoup.find('span', id='productTitle')
        if title is not None:
          title = title.text.strip()
        else:
           title= None
        rating = dpsoup.find('span', id='acrCustomerReviewText')
        if rating is not None:
           rating = rating.text
        else:
           rating = None
        price = dpsoup.find('span', class_='a-size-mini twisterSwatchPrice')
        if price is not None:
           price = price.text
        else:
           price = None
        print(f'\nproduct_link: {product_link}\n\nproduct_title: {title}\n\nproduct_price: {price}\n\nproduct_rating: {rating}\n\n')
        #this is for scrape all gallray image src
        with io.open("amazon.csv", "a",encoding="utf-8") as f:
           for url in dpsoup.select('span.a-button-text > img')[3:10]:
              print(f"image link:{url['src']}")
              writeFile = csv.writer(f)
              writeFile.writerow([url['src'],product_link ,title,rating,price]) 
                

        
Reply
#8
make sure you saved the file after editing. what you say doesn't make sense. if it print all url (line 36) it will write the same to the file (line 38)
and by the way, it doesn't affect your code, instead of io.open you can just use open.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#9
see my python shell here I am getting multiple images url for every product:

product_link: https://www.amazon.com/Wireless-Uiosmuph-Rechargeable-Portable-Computer/dp/B082M9D31R/ref=sr_1_19?crid=22TI4BA3RLK5J&dchild=1&keywords=wireless+mouse&qid=1599517835&sprefix=w%2Caps%2C528&sr=8-19

product_title: LED Wireless Mouse, Uiosmuph G12 Slim Rechargeable Wireless Silent Mouse, 2.4G Portable USB Optical Wireless Computer Mice with USB Receiver and Type C Adapter (Black)

product_price:  $15.99 

product_rating: 3,801 ratings


image link:https://images-na.ssl-images-amazon.com/images/I/41JjzLoJGzL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/41bedZPDK4L._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/41JpcX8MKfL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/513b4p8dFbL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif

product_link: https://www.amazon.com/VicTsing-Wireless-Receiver-Noiseless-Computer/dp/B073CDQ1Z4/ref=sr_1_20?crid=22TI4BA3RLK5J&dchild=1&keywords=wireless+mouse&qid=1599517835&sprefix=w%2Caps%2C528&sr=8-20

product_title: VicTsing [Upgraded] Slim Wireless Mouse, 2.4G Silent Laptop Mouse- Enjoy Noiseless Clicking, 1600DPI High Accuracy Portable Ergonomic Optical Wireless Mouse for Laptop, PC, Computer, Notebook, Mac

product_price:  $8.99 

product_rating: 9,813 ratings


image link:https://images-na.ssl-images-amazon.com/images/I/410U75vLdZL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/41DuollLgyL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/51XcrXNTWSL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/51m74igY8bL._SS40_BG85,85,85_BR-120_PKdp-play-icon-overlay__.jpg
image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif

product_link: https://www.amazon.com/Rii-Wireless-Rechargeable-Colorful-Mice-Black/dp/B07JHHTGFK/ref=sr_1_21?crid=22TI4BA3RLK5J&dchild=1&keywords=wireless+mouse&qid=1599517835&sprefix=w%2Caps%2C528&sr=8-21

product_title: Rii RM200 Wireless Mouse,2.4G Wireless Mouse 5 Buttons Rechargeable Mobile Optical Mouse with USB Nano Receiver,3 Adjustable DPI Levels,Colorful LED Lights for Notebook,PC,Computer-Black

product_price: None

product_rating: 4,071 ratings


image link:https://images-na.ssl-images-amazon.com/images/I/41D3bPA0rjL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/41xycao2DvL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif

product_link: https://www.amazon.com/gp/slredirect/picassoRedirect.html/ref=pa_sp_btf_aps_sr_pg1_1?ie=UTF8&adId=A05990032I8VXHPWQ12H5&url=%2FMemzuoix-Wireless-Portable-Receiver-Ergonomic%2Fdp%2FB07WCW1PB3%2Fref%3Dsr_1_22_sspa%3Fcrid%3D22TI4BA3RLK5J%26dchild%3D1%26keywords%3Dwireless%2Bmouse%26qid%3D1599517835%26sprefix%3Dw%252Caps%252C528%26sr%3D8-22-spons%26psc%3D1&qualifier=1600094980&id=1873951770229141&widgetName=sp_btf

product_title: Memzuoix 2.4G Wireless Mouse, Portable Mobile Cordless Mouse with USB Receiver, 1,000 DPI Ergonomic Computer Wireless Mouse for Laptop, Desktop, Mac, PC, 5 Buttons, Red

product_price:  $12.99 

product_rating: 609 ratings


image link:https://images-na.ssl-images-amazon.com/images/I/61JiHsWLn6L._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/51v8m7XkV%2BL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/51S7CTnE5aL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/I/410fJq0LtLL._AC_US40_.jpg
image link:https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif
But in my csv still now getting only one product url from each product
Reply
#10
You distinguish between img url and product url, do you?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  BeautifulSoup Showing none while extracting image url josephandrew 0 1,932 Sep-20-2021, 11:40 AM
Last Post: josephandrew
  Image Scraper (beautifulsoup), stopped working, need to help see why woodmister 9 4,028 Jan-12-2021, 04:10 PM
Last Post: woodmister

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020