Python Forum

Full Version: Any way to remove HTML tags from scraped data? (I want text only)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello everyone!

I would like to thank you in advance for looking at my thread, and trying to resolve my issue.

What I'm trying to achieve, is to scrape the current value of gold (in ounces) from a website. However, my code pulls the data correctly, but it displays the HTML tags in the printed results.

I've spent countless hours Googling to try and fix this, but I cannot resolve it. You guys are my only hope haha!

Here's my code I'm using to scrape with:

#imports required modules
import requests
from bs4 import BeautifulSoup
#requests html page to parse
page = requests.get("https://www.bullionbypost.co.uk/")
#parses page and stores it in the 'soup' variable
soup = BeautifulSoup(page.content, 'html.parser')
#searches for tags in the HTML
results = soup.find_all("span", {"class": "gold-price-per-ounce"})
#prints results from the executed code above
print(results)
This is what the program returns:
Output:
[<span class="gold-price-per-ounce">£1468.80</span>]
Like I mentioned earlier, the desired results would be to print the text string containing the value of gold (without the HTML tags).

Thanks again!
#imports required modules
import requests
from bs4 import BeautifulSoup
#requests html page to parse
page = requests.get("https://www.bullionbypost.co.uk/")
#parses page and stores it in the 'soup' variable
soup = BeautifulSoup(page.content, 'html.parser')
#searches for tags in the HTML
results = soup.find_all("span", {"class": "gold-price-per-ounce"})
print(f"Gold price per ounce is: {results[0].text}")
#prints results from the executed code above
# print(results)