Python Forum
Get text from within h3 html tags
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Get text from within h3 html tags
#3
Thanks, that is much better than my attempt, gleaned from the internet!

It was a bit harder to extract the defintions, but once I had the terms, I got them by getting the text that was not in the list of business finance terms.

I used random to shuffle the list of business finance terms, but I kept the definitions in the order they are on the webpage. Didn't want to make it too difficult!

import requests
from bs4 import BeautifulSoup
import random

url = 'https://www.fundera.com/blog/business-finance-terms-and-definitions'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')
text = soup.find_all(text=True)

output = ''
blacklist = [
    '[document]',
    'noscript',
    'header',
    'html',
    'meta',
    'head', 
    'input',
    'script',
    'style'
    # there may be more elements you don't want, such as "style", etc.
]

for t in text:
    if t.parent.name not in blacklist:
        output += '{} '.format(t)

text_list = output.split('\n')
useful_text = '\n'.join(text_list)
savepath = '/home/pedro/temp/'

with open(savepath + 'biz_definitions.txt', 'w') as f:
    f.write(useful_text)
    
print('All done! Text saved to', savepath + 'biz_definitions.txt' )
menator01 likes this post
Reply


Messages In This Thread
Get text from within h3 html tags - by Pedroski55 - Jan-02-2022, 12:55 AM
RE: Get text from within h3 html tags - by Pedroski55 - Jan-02-2022, 11:14 PM
RE: Get text from within h3 html tags - by Larz60+ - Jan-04-2022, 05:45 AM
RE: Get text from within h3 html tags - by Larz60+ - Jan-05-2022, 06:50 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  why doesn't it replace all html tags? Melcu54 3 815 Jul-05-2023, 04:47 AM
Last Post: Melcu54
  Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row AaronCatolico1 0 988 Dec-25-2022, 06:28 PM
Last Post: AaronCatolico1
  BeautifulSoup - I can't translate html tags that contain <a href=..</a> OR <em></em> Melcu54 10 1,749 Oct-27-2022, 08:58 AM
Last Post: wavic
  How to find tags using specific text (timestamps) in a url? q988988 1 1,414 Mar-08-2022, 08:09 AM
Last Post: buran
  reading html and edit chekcbox to html jacklee26 5 3,168 Jul-01-2021, 10:31 AM
Last Post: snippsat
  Parsing link from html tags with Python Melcu54 0 1,641 Jun-14-2021, 09:25 AM
Last Post: Melcu54
  Delimiters - How to skip some html tags from being translate Melcu54 0 1,692 May-26-2021, 06:21 AM
Last Post: Melcu54
  Including a Variable In the HTML Tags When Sending An Email JoeDainton123 0 1,924 Aug-08-2020, 03:11 AM
Last Post: JoeDainton123
  Making .exe file that requires access to text and html files ClassicalSoul 0 1,606 Apr-23-2020, 05:03 PM
Last Post: ClassicalSoul
  Loop through tags inside tags in Selenium/Python xpack24 1 5,754 Oct-23-2019, 10:15 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020