Python Forum
Any way to remove HTML tags from scraped data? (I want text only)
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Any way to remove HTML tags from scraped data? (I want text only)
#1
Hello everyone!

I would like to thank you in advance for looking at my thread, and trying to resolve my issue.

What I'm trying to achieve, is to scrape the current value of gold (in ounces) from a website. However, my code pulls the data correctly, but it displays the HTML tags in the printed results.

I've spent countless hours Googling to try and fix this, but I cannot resolve it. You guys are my only hope haha!

Here's my code I'm using to scrape with:

#imports required modules
import requests
from bs4 import BeautifulSoup
#requests html page to parse
page = requests.get("https://www.bullionbypost.co.uk/")
#parses page and stores it in the 'soup' variable
soup = BeautifulSoup(page.content, 'html.parser')
#searches for tags in the HTML
results = soup.find_all("span", {"class": "gold-price-per-ounce"})
#prints results from the executed code above
print(results)
This is what the program returns:
Output:
[<span class="gold-price-per-ounce">£1468.80</span>]
Like I mentioned earlier, the desired results would be to print the text string containing the value of gold (without the HTML tags).

Thanks again!
Larz60+ write Nov-02-2020, 08:08 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.

I added for you this time. Please use bbcode tags on future posts. Thank you.
Reply
#2
#imports required modules
import requests
from bs4 import BeautifulSoup
#requests html page to parse
page = requests.get("https://www.bullionbypost.co.uk/")
#parses page and stores it in the 'soup' variable
soup = BeautifulSoup(page.content, 'html.parser')
#searches for tags in the HTML
results = soup.find_all("span", {"class": "gold-price-per-ounce"})
print(f"Gold price per ounce is: {results[0].text}")
#prints results from the executed code above
# print(results)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Trying to scrape data from HTML with no identifiers pythonpaul32 2 795 Dec-02-2023, 03:42 AM
Last Post: pythonpaul32
  Weird characters scraped samuelbachorik 3 855 Oct-29-2023, 02:36 PM
Last Post: DeaD_EyE
  Web scraper not populating .txt with scraped data BlackHeart 5 1,457 Apr-03-2023, 05:12 PM
Last Post: snippsat
Question Python Obstacles | Jeet-Kune-Do | BS4 (Tags > MariaDB) [URL/Local HTML] BrandonKastning 0 1,400 Feb-08-2022, 08:55 PM
Last Post: BrandonKastning
  Python Obstacles | Krav Maga | Wiki Scraped Content [Column Copy] BrandonKastning 4 2,161 Jan-03-2022, 06:59 AM
Last Post: BrandonKastning
  Python Obstacles | Kapap | Wiki Scraped Content [Column Nulling] BrandonKastning 2 1,687 Jan-03-2022, 04:26 AM
Last Post: BrandonKastning
  Post HTML Form Data to API Endpoints Dexty 0 1,382 Nov-11-2021, 10:51 PM
Last Post: Dexty
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,529 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Cleaning HTML data using Jupyter Notebook jacob1986 7 4,051 Mar-05-2021, 10:44 PM
Last Post: snippsat
  Easy HTML Parser: Validating trs by attributes several tags deep? runswithascript 7 3,499 Aug-14-2020, 10:58 PM
Last Post: runswithascript

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020