Python Forum

Full Version: Product Image Download Help Required
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi everyone, since I love python language very much, I am constantly experimenting and this time I am trying to make a new experiment, I have been researching on this for 10 days but I have not been successful.

My requests in the Python code I failed : This Python code downloads product images from the BoohooMan website, creates a folder for each product and saves them in folders. The code is also organized to convert the images to base64 format and save them to a file.
The product list I want : https://www.boohooman.com/us/mens/tops/t...20T-Shirts
I need help very urgently, thank you in advance.

Pulling Image URLs from HTML Content:

Issue: When pulling image URLs, we could not properly separate the product IDs from the data-url attribute.
Creating an Image URL Format:

Issue: The image URL format was incompatible with the previous method. The new format needed to include ?$product_image_category_category_category_page_tablet_landscape_pro_2x$.
Downloading and Saving Images:

Issue: The download worked, but the files were not saved in the specified folder. This can often be caused by file path or permission issues.
Saving with Base64:

Issue: Images were not saved in base64 format. The script should have been set to convert the image to base64 and save it to a txt file.
show us what you've tried, working or not.
Are you using HTML too to get the full url path from the website? Some HTML website blocks off the picture from the URL. The extension then becomes .mhtml. Trying to retrieve the full url path from Python codes. Linking the web page to that url page with pictures and the proper categories is what you want.

What's the module or file you are using to get this Link?
import os
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse

# Constants
BASE_URL = "https://www.boohooman.com/us/mens/tops/t-shirts?prefn1=style&prefv1=Printed%20T-Shirts"
HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"}
DEST_FOLDER = os.path.join(os.path.expanduser("~"), "Desktop", "BoohooMan_TShirts")

# Create destination folder if it doesn't exist
os.makedirs(DEST_FOLDER, exist_ok=True)

def get_soup(url):
    response = requests.get(url, headers=HEADERS)
    response.raise_for_status()
    return BeautifulSoup(response.text, 'html.parser')

def save_image(image_url, folder_path, image_name):
    response = requests.get(image_url)
    response.raise_for_status()
    with open(os.path.join(folder_path, image_name), 'wb') as f:
        f.write(response.content)

def download_product_images():
    soup = get_soup(BASE_URL)
    products = soup.find_all('div', class_='product-item')

    for product in products:
        product_link = product.find('a', class_='product-item-link')['href']
        product_name = product.find('a', class_='product-item-link').get_text(strip=True)
        product_folder = os.path.join(DEST_FOLDER, product_name)

        # Create a folder for each product
        os.makedirs(product_folder, exist_ok=True)

        product_soup = get_soup(urljoin(BASE_URL, product_link))
        image_elements = product_soup.find_all('img', class_='primary-image')

        for idx, img in enumerate(image_elements):
            img_url = img['src']
            img_url = urljoin(BASE_URL, img_url)
            save_image(img_url, product_folder, f'image_{idx + 1}.jpg')

if __name__ == "__main__":
    download_product_images()
    print(f"Images downloaded and saved in {DEST_FOLDER}")
??????????????????????????????????????
The site has changed,so you most update and test your code.
Example this line will not find anything.
products = soup.find_all('div', class_='product-item')
Output:
(Pdb) products []
So look at site(inspect in dev tools) and start test for changes.
import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
url = 'https://www.boohooman.com/us/mens/tops/t-shirts?prefn1=style&prefv1=Printed%20T-Shirts'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
products = soup.find_all('div', class_='product-tile js-product-tile')
>>> print(products[0].select_one('.product-tile-name').text.strip())
Oversized Boxy Extended Neck Palm Tree T-shirt
>>> products[0].get('data-itemid')
'BMM85247'
>>> 
>>> print(products[1].select_one('.product-tile-name').text.strip())
Oversized Heavyweight Paisley Applique T-shirt
>>> products[1].get('data-itemid')
'BMM83448'