Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Parsing based on variables in the website
#1
Hi,

I am a newby at Python, so bear with me.
My code is already working for multiple websites with the same setup (example: https://www.dropbox.com/s/uka24w7o5006ol....html?dl=0).
My code is now based on one specific CEO, but i want this to work for all executives named in the top of every individual html. These executives are named in the HTML part as shown below
https://i.stack.imgur.com/9zqOb.png

Could someone help me further?
Below the code till this far.

import textwrap
import os
from bs4 import BeautifulSoup

directory ='C:/Research syntheses - Meta analysis/SeekingAlpha/out'
for filename in os.listdir(directory):
    if filename.endswith('.html'):
        fname = os.path.join(directory,filename)
        with open(fname, 'r') as f:
            soup = BeautifulSoup(f.read(),'html.parser')

print('{:<30} {:<70}'.format('Name', 'Answer'))
print('-' * 101)
for answer in soup.select('p:contains("Question-and-Answer Session") ~ strong:contains("Dror Ben Asher") + p'):
    txt = answer.get_text(strip=True)

    s = answer.find_next_sibling()
    while s:
        if s.name == 'strong' or s.find('strong'):
            break
        if s.name == 'p':
            txt += ' ' + s.get_text(strip=True)
        s = s.find_next_sibling()

    txt = ('\n' + ' '*31).join(textwrap.wrap(txt))

    print('{:<30} {:<70}'.format('Dror Ben Asher - CEO', txt), file=open("output.txt", "a"))

Quote
#2
I'm not sure about your overall problem, but I suspect there's a bug in your program on line 19
Output:
>>> "abc".find('x') -1 >>> bool("abc".find('x')) True
It's going to be true if "strong" is not in the string. It would only be false if s starts with "strong" because that would result in an index of 0, which would be treated as false.
buran likes this post
Feel like you're not getting the answers you want? Checkout the help/rules for things like what to include/not include in a post, how to use code tags, how to ask smart questions, and more.

Pro-tip - there's an inverse correlation between the number of lines of code posted and my enthusiasm for helping with a question :)
Quote
#3
Thank you! Is there also someone who can help me with my main problem?
In principe i need all the answers of the executives which are mentioned in the html (executives are identified in the top of the html).
Quote
#4
Maybe i could clarify my question:
In my (downloaded) HTMLs i have in the top of every file executives mentioned (like Dror Ben Asher" in the code below):
Quote:<DIV id=article_participants class="content_part hid">
<P>Redhill Biopharma Ltd. (NASDAQ:<A title="" href="http://seekingalpha.com/symbol/rdhl" symbolSlug="RDHL">RDHL</A>)</P>
<P>Q4 2014 <SPAN class=transcript-search-span style="BACKGROUND-COLOR: yellow">Earnings</SPAN> Conference <SPAN class=transcript-search-span style="BACKGROUND-COLOR: #f38686">Call</SPAN></P>
<P>February 26, 2015 9:00 AM ET</P>
<P><STRONG>Executives</STRONG></P>
<P>Dror Ben Asher - CEO</P>
<P>Ori Shilo - Deputy CEO, Finance and Operations</P>
<P>Guy Goldberg - Chief Business Officer</P>

Further along the html these executives name reaccurs multiple times where after the name follows an text element i want to parse Example
Quote:<P>
<STRONG> Dror Ben Asher </STRONG>
</P>
<P>Yeah, in terms of production in first quarter, we’re going to be lower than we had forecasted mainly due to our grade. We’ve had a couple of higher grade stopes in our Seabee complex that we’ve had some significant problems in terms of ground failures and dilution effects. In addition, not helping out, we’ve had some equipment downtime on some of our smaller silt development, so the combination of those two issues are affecting us.
</p>

For now i have a code (see above posting) which identifies one executive "Dror Ben Asher" and graps all the text which accurs after in the P element. But I would like this to work for all executives and for Multiple html files where different executives are mentioned (different company).
In dropbox i shared the download html file: dropbox.com/s/uka24w7o5006ole/transcript-86-855.html?dl=0
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Problem parsing website html file thefpgarace 2 944 May-01-2018, 11:09 AM
Last Post: Standard_user
  Using python requests module and BS4 to login on an Wordpress based website apollo 1 4,384 Feb-06-2018, 01:31 AM
Last Post: metulburr

Forum Jump:


Users browsing this thread: 1 Guest(s)