![]() |
How can I ignore empty fields when scrapping - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: How can I ignore empty fields when scrapping (/thread-36362.html) |
How can I ignore empty fields when scrapping - never5000 - Feb-11-2022 I'm gathering Card data for a game I like. some cards don't have the certain text and others do, the script searches for it and when it comes across a card with missing data it just breaks and doesn't continue. I need to ignore a few sections, not all cards have weakness/flavor text/evolving etc There are about 230 cards to scrape, the code below stops after 6, as the 7th card on the page doesn't have "flavor" text. if I comment out "Flavour" it scrapes about 126 cards, as the card it stops on doesn't have "att" or "weak" etc. So I need to tell the script, if you come across something missing, just ignore it and move on. But I don't know how to do this. Here is my code from bs4 import BeautifulSoup import requests, openpyxl excel = openpyxl.Workbook() print(excel.sheetnames) sheet = excel.active sheet.title = "Pokemon Cards" print(excel.sheetnames) sheet.append(['title', 'slug', 'sku', 'category_id', 'price', 'discount_rate', 'vat_rate', 'stock', 'description', 'image_url', 'external_link']) try: source = requests.get('https://pkmncards.com/set/chilling-reign/?sort=date&ord=auto&display=full') source.raise_for_status() soup = BeautifulSoup(source.text, 'html.parser') cards = soup.find_all(class_="entry") for card in cards: #title = card.find(class_="card-title") title = card.find('h2').span.text details = card.find(class_='card-tabs').text image_url = card.find(class_='card-image-area').a price = card.find(class_='m').span.text name = card.find(class_='name-hp-color').text att = card.find(class_="tab").find(class_="text").text evol = card.find(class_='type-evolves-is').text weak = card.find(class_='weak-resist-retreat') ill = card.find(class_='illus minor-text').text release = card.find(class_='release-meta minor-text').text stan = card.find(class_='mark-formats minor-text').text flavor = card.find(class_='flavor minor-text').text slug = "" sku = "" category_id = "50" discount_rate = "" vat_rate = "" stock = "4" external_link = "" #description1 = "<b>Card Name</b> " + name + " <br> " + evol + " <br> " + att + " <br> " + weak + " <br> " + ill + " <br> " + release + " <br> "+ stan + " <br> " + " <br><br> " + "All Prices are subject to change please message me for more details." #description = description1.replace("Pokémon", "Pokemon").replace("×", "x").replace(" → ", " > ").replace("⇢", ">").replace("↘", ">").replace(" · ", " - ").replace(" › ", " > ").replace("’", "'") print(title) #print(title, description, image_url.get('href'), price) #sheet.append([title, slug, sku, category_id, price, discount_rate, vat_rate, stock, description, image_url.get('href'), external_link]) except Exception as e: print(e) #excel.save('chill_all4.xlsx') |