Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parse data from downloaded html
#1
I want to extract from a directory (where i have stored my downloaded htmls), all the "executives". In this directory their are app 1.000 htmls stored wich should have the div id= article_participants element (if not, the files can be ignored) :

Quote:<DIV id=article_participants class="content_part hid">
<P>Redhill Biopharma Ltd. (NASDAQ:<A title="" href="http://seekingalpha.com/symbol/rdhl" symbolSlug="RDHL">RDHL</A>)</P>
<P>Q4 2014 <SPAN class=transcript-search-span style="BACKGROUND-COLOR: yellow">Earnings</SPAN> Conference <SPAN class=transcript-search-span style="BACKGROUND-COLOR: #f38686">Call</SPAN></P>
<P>February 26, 2015 9:00 AM ET</P>
<P><STRONG>Executives</STRONG></P>
<P>Dror Ben Asher - CEO</P>
<P>Ori Shilo - Deputy CEO, Finance and Operations</P>
<P>Guy Goldberg - Chief Business Officer</P>
<P><STRONG>Analysts</STRONG></P>

My output would need to be Name, Function, Period, Symbol:
Example: Ori Shilo | Deputy CEO,Finance and operations | q4 2014 | RDHL
I tried the following, but it's not sufficient:
import textwrap
import os
from bs4 import BeautifulSoup

directory ='C:/Research syntheses - Meta analysis/SeekingAlpha/out'
for filename in os.listdir(directory):
    if filename.endswith('.html'):
        fname = os.path.join(directory,filename)
        with open(fname, 'r') as f:
            soup = BeautifulSoup(f.read(),'html.parser')

print('{:<30} {:<70}'.format('Name', 'Answer'))
print('-' * 101)
Can someone help me?Q4
Reply


Messages In This Thread
Parse data from downloaded html - by nikos48 - Jan-22-2020, 06:58 PM
RE: Parse data from downloaded html - by metulburr - Jan-23-2020, 02:54 PM
RE: Parse data from downloaded html - by buran - Jan-23-2020, 02:57 PM
RE: Parse data from downloaded html - by nikos48 - Jan-23-2020, 06:36 PM
RE: Parse data from downloaded html - by metulburr - Jan-25-2020, 01:37 AM
RE: Parse data from downloaded html - by nikos48 - Jan-25-2020, 11:24 AM
RE: Parse data from downloaded html - by metulburr - Jan-25-2020, 07:36 PM
RE: Parse data from downloaded html - by nikos48 - Jan-26-2020, 03:35 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Trying to scrape data from HTML with no identifiers pythonpaul32 2 865 Dec-02-2023, 03:42 AM
Last Post: pythonpaul32
  Deployed Spider on Heroku: How do I email downloaded files? JaneTan 2 1,570 Mar-24-2022, 08:31 AM
Last Post: JaneTan
  Post HTML Form Data to API Endpoints Dexty 0 1,413 Nov-11-2021, 10:51 PM
Last Post: Dexty
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,652 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Cleaning HTML data using Jupyter Notebook jacob1986 7 4,152 Mar-05-2021, 10:44 PM
Last Post: snippsat
  Any way to remove HTML tags from scraped data? (I want text only) SeBz2020uk 1 3,479 Nov-02-2020, 08:12 PM
Last Post: Larz60+
  html data cell attribute issue delahug 5 3,163 May-31-2020, 09:18 AM
Last Post: delahug
  Extracting html data using attributes WiPi 14 5,512 May-04-2020, 02:04 PM
Last Post: snippsat
  extrat data from a button html windows11 1 1,995 Mar-24-2020, 03:39 PM
Last Post: Larz60+
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,375 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020