Extract text between bold headlines from HTML - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Extract text between bold headlines from HTML (/thread-20810.html) |
Extract text between bold headlines from HTML - CostasG - Aug-31-2019 I need to extract text from company transcripts. The files are in HTML format, saved locally in my PC. What I need to do is extract each executive's text. To do this, I would like to have a code which will extract the text after the name of each executive (which is in Bold). Each executive appears many times in the file. So, I would like to have the text of each executive grouped together. I have found a solution to a similar concept but I do not know how to adapt this to my case as I am really new at Python: https://stackoverflow.com/questions/37743401/python-pulling-bold-text-and-the-text-that-follows A sample file can be found here: https://www.dropbox.com/sh/ak5cxp4p7bxg5kq/AAA_FIPLfvEnmi5N_QpkFyR_a?dl=0 If anyone could help with this, I would greatly appreciate it. RE: Extract text between bold headlines from HTML - snippsat - Aug-31-2019 Can start looking at this Web-Scraping part-1. |