Parse data from downloaded html

nikos48 · Jan-22-2020, 06:58 PM

I want to extract from a directory (where i have stored my downloaded htmls), all the "executives". In this directory their are app 1.000 htmls stored wich should have the div id= article_participants element (if not, the files can be ignored) :

Quote:<DIV id=article_participants class="content_part hid">
Redhill Biopharma Ltd. (NASDAQ:<A title="" href="http://seekingalpha.com/symbol/rdhl" symbolSlug="RDHL">RDHL</A>)
Q4 2014 Earnings Conference Call
February 26, 2015 9:00 AM ET
Executives
Dror Ben Asher - CEO
Ori Shilo - Deputy CEO, Finance and Operations
Guy Goldberg - Chief Business Officer
Analysts

My output would need to be Name, Function, Period, Symbol:
Example: Ori Shilo | Deputy CEO,Finance and operations | q4 2014 | RDHL
I tried the following, but it's not sufficient:

import textwrap
import os
from bs4 import BeautifulSoup

directory ='C:/Research syntheses - Meta analysis/SeekingAlpha/out'
for filename in os.listdir(directory):
    if filename.endswith('.html'):
        fname = os.path.join(directory,filename)
        with open(fname, 'r') as f:
            soup = BeautifulSoup(f.read(),'html.parser')

print('{:<30} {:<70}'.format('Name', 'Answer'))
print('-' * 101)

Can someone help me?Q4

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Trying to scrape data from HTML with no identifiers	pythonpaul32	2	865	Dec-02-2023, 03:42 AM Last Post: pythonpaul32
	Deployed Spider on Heroku: How do I email downloaded files?	JaneTan	2	1,570	Mar-24-2022, 08:31 AM Last Post: JaneTan
	Post HTML Form Data to API Endpoints	Dexty	0	1,413	Nov-11-2021, 10:51 PM Last Post: Dexty
	HTML multi select HTML listbox with Flask/Python	rfeyer	0	4,652	Mar-14-2021, 12:23 PM Last Post: rfeyer
	Cleaning HTML data using Jupyter Notebook	jacob1986	7	4,152	Mar-05-2021, 10:44 PM Last Post: snippsat
	Any way to remove HTML tags from scraped data? (I want text only)	SeBz2020uk	1	3,479	Nov-02-2020, 08:12 PM Last Post: Larz60+
	html data cell attribute issue	delahug	5	3,163	May-31-2020, 09:18 AM Last Post: delahug
	Extracting html data using attributes	WiPi	14	5,512	May-04-2020, 02:04 PM Last Post: snippsat
	extrat data from a button html	windows11	1	1,995	Mar-24-2020, 03:39 PM Last Post: Larz60+
	Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row	BrandonKastning	0	2,375	Mar-22-2020, 06:10 AM Last Post: BrandonKastning

Parse data from downloaded html

User Panel Messages

Announcements