Python Forum
Extract text between bold headlines from HTML - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/Forum-Python-Coding)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/Forum-Web-Scraping-Web-Development)
+--- Thread: Extract text between bold headlines from HTML (/Thread-Extract-text-between-bold-headlines-from-HTML)



Extract text between bold headlines from HTML - CostasG - Aug-31-2019

I need to extract text from company transcripts. The files are in HTML format, saved locally in my PC. What I need to do is extract each executive's text. To do this, I would like to have a code which will extract the text after the name of each executive (which is in Bold). Each executive appears many times in the file. So, I would like to have the text of each executive grouped together.
I have found a solution to a similar concept but I do not know how to adapt this to my case as I am really new at Python:
https://stackoverflow.com/questions/37743401/python-pulling-bold-text-and-the-text-that-follows

A sample file can be found here:
https://www.dropbox.com/sh/ak5cxp4p7bxg5kq/AAA_FIPLfvEnmi5N_QpkFyR_a?dl=0

If anyone could help with this, I would greatly appreciate it.


RE: Extract text between bold headlines from HTML - snippsat - Aug-31-2019

Can start looking at this Web-Scraping part-1.