Python Forum

Full Version: Scraping hex codes
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi guys,

I've got a python script which reads product urls from csv which have pre selected variants the pages.

The script visits the pages and should just be scraping only the selected variant hex code but its scraping all of them

Can anyone identify why?

Here is the element with the selected variant hex code
[Image: V71gLuv]

Here is my script

import requests
from bs4 import BeautifulSoup
import csv

def get_hex_color(url):
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html.parser')
        color_elements = soup.find_all(class_='athenaProductVariations_colorSwatchInner')
        hex_colors = [element.get('style').split(':')[-1].strip() for element in color_elements]
        return hex_colors
    else:
        print(f"Failed to fetch data from {url}")
        return []

def process_urls(input_file, output_file):
    with open(input_file, 'r') as csvfile:
        reader = csv.DictReader(csvfile)
        rows = list(reader)

        for row in rows:
            url = row['URL']
            hex_colors = get_hex_color(url)
            row['Variant Metafield: custom.color [color]'] = ', '.join(hex_colors)

    with open(output_file, 'w', newline='') as outfile:
        fieldnames = rows[0].keys() if rows else []
        writer = csv.DictWriter(outfile, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(rows)

# Replace 'input.csv' and 'output.csv' with your file names
input_file = 'input.csv'
output_file = 'output.csv'

process_urls(input_file, output_file)
Thanks in advance!
If they have same class name will get all find_all(class_='athenaProductVariations_colorSwatchInner')
Right click on selcet tag Copy -> Copy selector .
Then in BS can use select or select_one to get this tag only,this is called CSS Selector
Example here same class name,but can get specific tag with span:nth-child.
from bs4 import BeautifulSoup

html = '''\
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8">
    <link rel="icon" href="./favicon.ico" type="image/x-icon">
  </head>
  <body>
    <span class="color_style" style="color:blue">blue</span>
    <span class="color_style" style="color:red">red</span>
  </body>
</html>'''

soup = BeautifulSoup(html, 'lxml')
>>> tag = soup.select_one('body > span:nth-child(1)')
>>> tag
<span class="color_style" style="color:blue">blue</span>

>>> tag = soup.select_one('body > span:nth-child(2)')
>>> tag
<span class="color_style" style="color:red">red</span>
>>> tag.get('style')
'color:red'