Get text from within h3 html tags

Pedroski55 · Jan-02-2022, 11:14 PM

Thanks, that is much better than my attempt, gleaned from the internet!

It was a bit harder to extract the defintions, but once I had the terms, I got them by getting the text that was not in the list of business finance terms.

I used random to shuffle the list of business finance terms, but I kept the definitions in the order they are on the webpage. Didn't want to make it too difficult!

import requests
from bs4 import BeautifulSoup
import random

url = 'https://www.fundera.com/blog/business-finance-terms-and-definitions'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')
text = soup.find_all(text=True)

output = ''
blacklist = [
    '[document]',
    'noscript',
    'header',
    'html',
    'meta',
    'head', 
    'input',
    'script',
    'style'
    # there may be more elements you don't want, such as "style", etc.
]

for t in text:
    if t.parent.name not in blacklist:
        output += '{} '.format(t)

text_list = output.split('\n')
useful_text = '\n'.join(text_list)
savepath = '/home/pedro/temp/'

with open(savepath + 'biz_definitions.txt', 'w') as f:
    f.write(useful_text)
    
print('All done! Text saved to', savepath + 'biz_definitions.txt' )

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	why doesn't it replace all html tags?	Melcu54	3	815	Jul-05-2023, 04:47 AM Last Post: Melcu54
	Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row	AaronCatolico1	0	988	Dec-25-2022, 06:28 PM Last Post: AaronCatolico1
	BeautifulSoup - I can't translate html tags that contain <a href=..</a> OR <em></em>	Melcu54	10	1,749	Oct-27-2022, 08:58 AM Last Post: wavic
	How to find tags using specific text (timestamps) in a url?	q988988	1	1,414	Mar-08-2022, 08:09 AM Last Post: buran
	reading html and edit chekcbox to html	jacklee26	5	3,168	Jul-01-2021, 10:31 AM Last Post: snippsat
	Parsing link from html tags with Python	Melcu54	0	1,641	Jun-14-2021, 09:25 AM Last Post: Melcu54
	Delimiters - How to skip some html tags from being translate	Melcu54	0	1,692	May-26-2021, 06:21 AM Last Post: Melcu54
	Including a Variable In the HTML Tags When Sending An Email	JoeDainton123	0	1,924	Aug-08-2020, 03:11 AM Last Post: JoeDainton123
	Making .exe file that requires access to text and html files	ClassicalSoul	0	1,606	Apr-23-2020, 05:03 PM Last Post: ClassicalSoul
	Loop through tags inside tags in Selenium/Python	xpack24	1	5,754	Oct-23-2019, 10:15 AM Last Post: Larz60+

Get text from within h3 html tags

User Panel Messages

Announcements