Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Extract text between bold headlines from HTML
I need to extract text from company transcripts. The files are in HTML format, saved locally in my PC. What I need to do is extract each executive's text. To do this, I would like to have a code which will extract the text after the name of each executive (which is in Bold). Each executive appears many times in the file. So, I would like to have the text of each executive grouped together.
I have found a solution to a similar concept but I do not know how to adapt this to my case as I am really new at Python:

A sample file can be found here:

If anyone could help with this, I would greatly appreciate it.
Can start looking at this Web-Scraping part-1.

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 66 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Web crawler extracting specific text from HTML lewdow 1 651 Jan-03-2020, 11:21 PM
Last Post: snippsat
  Help on parsing simple text on HTML amaumox 5 289 Jan-03-2020, 05:50 PM
Last Post: amaumox
  Extract text from tag content using regular expression Pavel_47 8 449 Nov-25-2019, 03:17 PM
Last Post: buran
  How do I extract specific lines from HTML files before and after a word? glittergirl 1 2,361 Aug-06-2019, 07:23 AM
Last Post: fishhook
  Getting a specific text inside an html with soup mathieugrimbert 9 3,948 Jul-10-2019, 12:40 PM
Last Post: mathieugrimbert
  Beutifulsoup: how to pick text that's not in HTML tags? pitonas 4 994 Oct-08-2018, 01:43 PM
Last Post: pitonas
  Decoding html to text string PeterPython 1 719 Aug-12-2018, 07:23 PM
Last Post: Larz60+
  Extract Anchor Text (Scrapy) soothsayerpg 2 2,216 Jul-21-2018, 07:18 AM
Last Post: soothsayerpg
  webscraping - failing to extract specific text from rontar 2 808 May-19-2018, 08:01 AM
Last Post: rontar

Forum Jump:

Users browsing this thread: 1 Guest(s)