Python Forum

Full Version: Extract text between bold headlines from HTML
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I need to extract text from company transcripts. The files are in HTML format, saved locally in my PC. What I need to do is extract each executive's text. To do this, I would like to have a code which will extract the text after the name of each executive (which is in Bold). Each executive appears many times in the file. So, I would like to have the text of each executive grouped together.
I have found a solution to a similar concept but I do not know how to adapt this to my case as I am really new at Python:

A sample file can be found here:

If anyone could help with this, I would greatly appreciate it.
Can start looking at this Web-Scraping part-1.