Python Forum
Parsing based on variables in the website
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parsing based on variables in the website
#1
Hi,

I am a newby at Python, so bear with me.
My code is already working for multiple websites with the same setup (example: https://www.dropbox.com/s/uka24w7o5006ol....html?dl=0).
My code is now based on one specific CEO, but i want this to work for all executives named in the top of every individual html. These executives are named in the HTML part as shown below
https://i.stack.imgur.com/9zqOb.png

Could someone help me further?
Below the code till this far.

import textwrap
import os
from bs4 import BeautifulSoup

directory ='C:/Research syntheses - Meta analysis/SeekingAlpha/out'
for filename in os.listdir(directory):
    if filename.endswith('.html'):
        fname = os.path.join(directory,filename)
        with open(fname, 'r') as f:
            soup = BeautifulSoup(f.read(),'html.parser')

print('{:<30} {:<70}'.format('Name', 'Answer'))
print('-' * 101)
for answer in soup.select('p:contains("Question-and-Answer Session") ~ strong:contains("Dror Ben Asher") + p'):
    txt = answer.get_text(strip=True)

    s = answer.find_next_sibling()
    while s:
        if s.name == 'strong' or s.find('strong'):
            break
        if s.name == 'p':
            txt += ' ' + s.get_text(strip=True)
        s = s.find_next_sibling()

    txt = ('\n' + ' '*31).join(textwrap.wrap(txt))

    print('{:<30} {:<70}'.format('Dror Ben Asher - CEO', txt), file=open("output.txt", "a"))
Reply


Messages In This Thread
Parsing based on variables in the website - by nikos48 - Jan-14-2020, 08:51 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Python tool based on website? zarize 2 2,434 Mar-21-2020, 02:25 PM
Last Post: zarize
  Problem parsing website html file thefpgarace 2 3,165 May-01-2018, 11:09 AM
Last Post: Standard_user
  Using python requests module and BS4 to login on an Wordpress based website apollo 1 9,370 Feb-06-2018, 01:31 AM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020