Jan-22-2020, 06:58 PM
I want to extract from a directory (where i have stored my downloaded htmls), all the "executives". In this directory their are app 1.000 htmls stored wich should have the div id= article_participants element (if not, the files can be ignored) :
My output would need to be Name, Function, Period, Symbol:
Example: Ori Shilo | Deputy CEO,Finance and operations | q4 2014 | RDHL
I tried the following, but it's not sufficient:
Quote:<DIV id=article_participants class="content_part hid">
<P>Redhill Biopharma Ltd. (NASDAQ:<A title="" href="http://seekingalpha.com/symbol/rdhl" symbolSlug="RDHL">RDHL</A>)</P>
<P>Q4 2014 <SPAN class=transcript-search-span style="BACKGROUND-COLOR: yellow">Earnings</SPAN> Conference <SPAN class=transcript-search-span style="BACKGROUND-COLOR: #f38686">Call</SPAN></P>
<P>February 26, 2015 9:00 AM ET</P>
<P><STRONG>Executives</STRONG></P>
<P>Dror Ben Asher - CEO</P>
<P>Ori Shilo - Deputy CEO, Finance and Operations</P>
<P>Guy Goldberg - Chief Business Officer</P>
<P><STRONG>Analysts</STRONG></P>
My output would need to be Name, Function, Period, Symbol:
Example: Ori Shilo | Deputy CEO,Finance and operations | q4 2014 | RDHL
I tried the following, but it's not sufficient:
import textwrap import os from bs4 import BeautifulSoup directory ='C:/Research syntheses - Meta analysis/SeekingAlpha/out' for filename in os.listdir(directory): if filename.endswith('.html'): fname = os.path.join(directory,filename) with open(fname, 'r') as f: soup = BeautifulSoup(f.read(),'html.parser') print('{:<30} {:<70}'.format('Name', 'Answer')) print('-' * 101)Can someone help me?Q4