Python Forum
Help on parsing simple text on HTML
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help on parsing simple text on HTML
#5
Here's a starter for you with selenium
you will need the display script below (place in same directory as main selenium script)
PrettifyPage.py
import requests
import pathlib


class PrettifyPage:
    def __init__(self):
        pass

    def prettify(self, soup, indent):
        pretty_soup = str()
        previous_indent = 0
        for line in soup.prettify().split("\n"):
            current_indent = str(line).find("<")
            if current_indent == -1 or current_indent > previous_indent + 2:
                current_indent = previous_indent + 1
            previous_indent = current_indent
            pretty_soup += self.write_new_line(line, current_indent, indent)
        return pretty_soup

    def write_new_line(self, line, current_indent, desired_indent):
        new_line = ""
        spaces_to_add = (current_indent * desired_indent) - current_indent
        if spaces_to_add > 0:
            for i in range(spaces_to_add):
                new_line += " "		
        new_line += str(line) + "\n"
        return new_line
main script
GetPrices.py:
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import PrettifyPage


class GetPrices:
    def __init__(self):
        self.pp = PrettifyPage.PrettifyPage()
        self.baseurl = "https://live.euronext.com/fr/product/index-options/"

    def start_browser(self):
        caps = webdriver.DesiredCapabilities().FIREFOX
        caps["marionette"] = True
        return webdriver.Firefox(capabilities=caps)

    def stop_browser(self, browser):
        browser.close()

    def get_quote(self, symbol):
        prettify = self.pp.prettify
        driver = self.start_browser()
        driver.get(f"{self.baseurl}{symbol}")
        time.sleep(2)
        source = driver.page_source
        soup = BeautifulSoup(source, 'lxml')
        table = soup.find('table', {'id': 'prices_tables_0'})
        headerinfo = table.thead.tr.find_all('th')
        for n, th in enumerate(headerinfo):
            print(f"\n====================== th_{n} ======================")
            print(f"{prettify(th, 2)}")
        quotes = table.tbody.find_all('tr')
        for n, quote in enumerate(quotes):
            print(f"\n====================== tr_Quote{n} ======================")
            print(f"{prettify(quote, 2)}")
        self.stop_browser(driver)


if __name__ == '__main__':
    gp = GetPrices()
    gp.get_quote('PXA-DPAR')
This does not do anything with the scraped data, other than display it
I leave data manipulation to you
sample output (I put in code tags for scrolling
====================== th_0 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="3" rowspan="1" scope="col">
  Comp.
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_1 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="2" rowspan="1" scope="col">
  Dernier
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_2 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Achat
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_3 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Vente
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_4 ======================
<th class="sorting_disabled" colspan="1" data-priority="1" rowspan="1" scope="col">
</th>
 


====================== th_5 ======================
<th class="sorting_disabled font-weight-bold" colspan="1" data-priority="1" rowspan="1" scope="col">
  Strike
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_6 ======================
<th class="sorting_disabled" colspan="1" data-priority="1" rowspan="1" scope="col">
</th>
 


====================== th_7 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Achat
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_8 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Vente
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_9 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="2" rowspan="1" scope="col">
  Dernier
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_10 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="3" rowspan="1" scope="col">
  Comp.
  <span class="sort-arrows">
  </span>
</th>


====================== tr_quote0 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    130.23
  </td>
  <td>
    -
  </td>
  <td>
    122.80
  </td>
  <td>
    150.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=592500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    5925.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=592500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    21.40
  </td>
  <td>
    21.71
  </td>
</tr>
 


====================== tr_quote1 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    109.83
  </td>
  <td>
    121.40
  </td>
  <td>
    105.00
  </td>
  <td>
    128.60
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=595000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    5950.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=595000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    24.70
  </td>
  <td>
    26.32
  </td>
</tr>
 


====================== tr_quote2 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    90.37
  </td>
  <td>
    102.00
  </td>
  <td>
    -
  </td>
  <td>
    108.10
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=597500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    5975.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=597500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    28.30
  </td>
  <td>
    31.86
  </td>
</tr>
 


====================== tr_quote3 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    72.37
  </td>
  <td>
    76.00
  </td>
  <td>
    72.40
  </td>
  <td>
    150.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=600000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6000.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=600000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    45.00
  </td>
  <td>
    35.00
  </td>
  <td>
    38.86
  </td>
</tr>
 


====================== tr_quote4 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    55.77
  </td>
  <td>
    65.20
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=602500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6025.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=602500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    43.10
  </td>
  <td>
    47.27
  </td>
</tr>
 


====================== tr_quote5 ======================
<tr class="bg-mantis-green-50 even" role="row">
  <td>
    40.68
  </td>
  <td>
    43.00
  </td>
  <td>
    -
  </td>
  <td>
    49.50
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=605000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6050.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=605000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    90.00
  </td>
  <td>
    53.20
  </td>
  <td>
    57.19
  </td>
</tr>
 


====================== tr_quote6 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    28.48
  </td>
  <td>
    35.50
  </td>
  <td>
    -
  </td>
  <td>
    40.70
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=607500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6075.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=607500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    76.20
  </td>
  <td>
    63.00
  </td>
  <td>
    69.98
  </td>
</tr>
 


====================== tr_quote7 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    19.16
  </td>
  <td>
    21.00
  </td>
  <td>
    -
  </td>
  <td>
    30.10
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=610000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6100.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=610000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    85.67
  </td>
</tr>
 


====================== tr_quote8 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    11.81
  </td>
  <td>
    18.60
  </td>
  <td>
    -
  </td>
  <td>
    15.60
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=612500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6125.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=612500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    103.32
  </td>
</tr>
 


====================== tr_quote9 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    7.17
  </td>
  <td>
    7.70
  </td>
  <td>
    2.90
  </td>
  <td>
    24.90
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=615000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6150.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=615000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    132.60
  </td>
  <td>
    -
  </td>
  <td>
    123.70
  </td>
</tr>
 


====================== tr_quote10 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    2.42
  </td>
  <td>
    2.80
  </td>
  <td>
    -
  </td>
  <td>
    7.50
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=620000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6200.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=620000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    168.96
  </td>
</tr>
Reply


Messages In This Thread
Help on parsing simple text on HTML - by amaumox - Jan-02-2020, 08:28 PM
RE: Help on parsing simple text on HTML - by Larz60+ - Jan-02-2020, 10:46 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,654 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Parsing html page and working with checkbox (on a captcha) straannick 17 11,377 Feb-04-2021, 02:54 PM
Last Post: snippsat
  Any way to remove HTML tags from scraped data? (I want text only) SeBz2020uk 1 3,480 Nov-02-2020, 08:12 PM
Last Post: Larz60+
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 2,703 Oct-01-2020, 02:19 PM
Last Post: snippsat
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,380 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Web crawler extracting specific text from HTML lewdow 1 3,416 Jan-03-2020, 11:21 PM
Last Post: snippsat
  Extract text between bold headlines from HTML CostasG 1 2,345 Aug-31-2019, 10:53 AM
Last Post: snippsat
  Getting a specific text inside an html with soup mathieugrimbert 9 15,969 Jul-10-2019, 12:40 PM
Last Post: mathieugrimbert
  XML Parsing - Find a specific text (ElementTree) TeraX 3 4,083 Oct-09-2018, 09:06 AM
Last Post: TeraX
  Beutifulsoup: how to pick text that's not in HTML tags? pitonas 4 4,739 Oct-08-2018, 01:43 PM
Last Post: pitonas

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020