Jan-14-2020, 08:51 AM
Hi,
I am a newby at Python, so bear with me.
My code is already working for multiple websites with the same setup (example: https://www.dropbox.com/s/uka24w7o5006ol....html?dl=0).
My code is now based on one specific CEO, but i want this to work for all executives named in the top of every individual html. These executives are named in the HTML part as shown below
https://i.stack.imgur.com/9zqOb.png
Could someone help me further?
Below the code till this far.
I am a newby at Python, so bear with me.
My code is already working for multiple websites with the same setup (example: https://www.dropbox.com/s/uka24w7o5006ol....html?dl=0).
My code is now based on one specific CEO, but i want this to work for all executives named in the top of every individual html. These executives are named in the HTML part as shown below
https://i.stack.imgur.com/9zqOb.png
Could someone help me further?
Below the code till this far.
import textwrap import os from bs4 import BeautifulSoup directory ='C:/Research syntheses - Meta analysis/SeekingAlpha/out' for filename in os.listdir(directory): if filename.endswith('.html'): fname = os.path.join(directory,filename) with open(fname, 'r') as f: soup = BeautifulSoup(f.read(),'html.parser') print('{:<30} {:<70}'.format('Name', 'Answer')) print('-' * 101) for answer in soup.select('p:contains("Question-and-Answer Session") ~ strong:contains("Dror Ben Asher") + p'): txt = answer.get_text(strip=True) s = answer.find_next_sibling() while s: if s.name == 'strong' or s.find('strong'): break if s.name == 'p': txt += ' ' + s.get_text(strip=True) s = s.find_next_sibling() txt = ('\n' + ' '*31).join(textwrap.wrap(txt)) print('{:<30} {:<70}'.format('Dror Ben Asher - CEO', txt), file=open("output.txt", "a"))