Python Forum
Help on parsing simple text on HTML
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help on parsing simple text on HTML
#1
Hello Everyone,

I've been trying to parse an HTML page in order to get a simple price but I can't succeed, I tried to get help from youtube tutorials and from many websites but I need your help.

page_options_cac40 ="https://live.euronext.com/fr/product/index-options/PXA-DPAR"
import bs4
import requests
from bs4 import BeautifulSoup

def get_all_strikes():
    r=requests.get(page_options_cac40)
    page = bs4.BeautifulSoup(r.text,"html.parser")
    
    #<td class=" font-weight-bold">4400.00</td> <--- line with the code and the strike to find
    
    premier_strike = page.find('td', attrs={'class': 'font-weight-bolde'})
    print(premier_strike)
I'd like to get the first strike price which is 4400. The code runs but returns "None"....
Thank you very much in advance.

Happy new year
Reply
#2
There is no strike price which is 4400 on this site.

Output:
January 2020 Cours - 02/01/20 Comp. Dernier Achat Vente Strike Achat Vente Dernier Comp. 130.23 - 122.80 150.00 C 5925.00 P - - 21.40 21.71 109.83 121.40 105.00 128.60 C 5950.00 P - - 24.70 26.32 90.37 102.00 - 108.10 C 5975.00 P - - 28.30 31.86 72.37 76.00 72.40 150.00 C 6000.00 P - 45.00 35.00 38.86 55.77 65.20 - - C 6025.00 P - - 43.10 47.27 40.68 43.00 - 49.50 C 6050.00 P - 90.00 53.20 57.19 28.48 35.50 - 40.70 C 6075.00 P - 76.20 63.00 69.98 19.16 21.00 - 30.10 C 6100.00 P - - - 85.67 11.81 18.60 - 15.60 C 6125.00 P - - - 103.32 7.17 7.70 2.90 24.90 C 6150.00 P - 132.60 - 123.70 2.42 2.80 - 7.50 C 6200.00 P - - - 168.96
Reply
#3
Also, you will need to use selenium to render the JavaScript, otherwise you will not find your table.
Reply
#4
(Jan-02-2020, 09:08 PM)Axel_Erfurt Wrote: There is no strike price which is 4400 on this site.

Output:
January 2020 Cours - 02/01/20 Comp. Dernier Achat Vente Strike Achat Vente Dernier Comp. 130.23 - 122.80 150.00 C 5925.00 P - - 21.40 21.71 109.83 121.40 105.00 128.60 C 5950.00 P - - 24.70 26.32 90.37 102.00 - 108.10 C 5975.00 P - - 28.30 31.86 72.37 76.00 72.40 150.00 C 6000.00 P - 45.00 35.00 38.86 55.77 65.20 - - C 6025.00 P - - 43.10 47.27 40.68 43.00 - 49.50 C 6050.00 P - 90.00 53.20 57.19 28.48 35.50 - 40.70 C 6075.00 P - 76.20 63.00 69.98 19.16 21.00 - 30.10 C 6100.00 P - - - 85.67 11.81 18.60 - 15.60 C 6125.00 P - - - 103.32 7.17 7.70 2.90 24.90 C 6150.00 P - 132.60 - 123.70 2.42 2.80 - 7.50 C 6200.00 P - - - 168.96

Hey, that's because you've probably displayed the compact mode, try the "All strikes" (just on the right, next to the Maturity selection).

By the way, how the heck did you get this output !!! tell me the magic lol

(Jan-02-2020, 09:09 PM)Larz60+ Wrote: Also, you will need to use selenium to render the JavaScript, otherwise you will not find your table.

Ok... but I have no clear idea about how to do that. How could you know it is a page written in javascript ?
I don't have a lot of knowledge about web programming...

Thank you by the way
Reply
#5
Here's a starter for you with selenium
you will need the display script below (place in same directory as main selenium script)
PrettifyPage.py
import requests
import pathlib


class PrettifyPage:
    def __init__(self):
        pass

    def prettify(self, soup, indent):
        pretty_soup = str()
        previous_indent = 0
        for line in soup.prettify().split("\n"):
            current_indent = str(line).find("<")
            if current_indent == -1 or current_indent > previous_indent + 2:
                current_indent = previous_indent + 1
            previous_indent = current_indent
            pretty_soup += self.write_new_line(line, current_indent, indent)
        return pretty_soup

    def write_new_line(self, line, current_indent, desired_indent):
        new_line = ""
        spaces_to_add = (current_indent * desired_indent) - current_indent
        if spaces_to_add > 0:
            for i in range(spaces_to_add):
                new_line += " "		
        new_line += str(line) + "\n"
        return new_line
main script
GetPrices.py:
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import PrettifyPage


class GetPrices:
    def __init__(self):
        self.pp = PrettifyPage.PrettifyPage()
        self.baseurl = "https://live.euronext.com/fr/product/index-options/"

    def start_browser(self):
        caps = webdriver.DesiredCapabilities().FIREFOX
        caps["marionette"] = True
        return webdriver.Firefox(capabilities=caps)

    def stop_browser(self, browser):
        browser.close()

    def get_quote(self, symbol):
        prettify = self.pp.prettify
        driver = self.start_browser()
        driver.get(f"{self.baseurl}{symbol}")
        time.sleep(2)
        source = driver.page_source
        soup = BeautifulSoup(source, 'lxml')
        table = soup.find('table', {'id': 'prices_tables_0'})
        headerinfo = table.thead.tr.find_all('th')
        for n, th in enumerate(headerinfo):
            print(f"\n====================== th_{n} ======================")
            print(f"{prettify(th, 2)}")
        quotes = table.tbody.find_all('tr')
        for n, quote in enumerate(quotes):
            print(f"\n====================== tr_Quote{n} ======================")
            print(f"{prettify(quote, 2)}")
        self.stop_browser(driver)


if __name__ == '__main__':
    gp = GetPrices()
    gp.get_quote('PXA-DPAR')
This does not do anything with the scraped data, other than display it
I leave data manipulation to you
sample output (I put in code tags for scrolling
====================== th_0 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="3" rowspan="1" scope="col">
  Comp.
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_1 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="2" rowspan="1" scope="col">
  Dernier
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_2 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Achat
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_3 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Vente
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_4 ======================
<th class="sorting_disabled" colspan="1" data-priority="1" rowspan="1" scope="col">
</th>
 


====================== th_5 ======================
<th class="sorting_disabled font-weight-bold" colspan="1" data-priority="1" rowspan="1" scope="col">
  Strike
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_6 ======================
<th class="sorting_disabled" colspan="1" data-priority="1" rowspan="1" scope="col">
</th>
 


====================== th_7 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Achat
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_8 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Vente
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_9 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="2" rowspan="1" scope="col">
  Dernier
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_10 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="3" rowspan="1" scope="col">
  Comp.
  <span class="sort-arrows">
  </span>
</th>


====================== tr_quote0 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    130.23
  </td>
  <td>
    -
  </td>
  <td>
    122.80
  </td>
  <td>
    150.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=592500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    5925.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=592500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    21.40
  </td>
  <td>
    21.71
  </td>
</tr>
 


====================== tr_quote1 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    109.83
  </td>
  <td>
    121.40
  </td>
  <td>
    105.00
  </td>
  <td>
    128.60
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=595000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    5950.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=595000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    24.70
  </td>
  <td>
    26.32
  </td>
</tr>
 


====================== tr_quote2 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    90.37
  </td>
  <td>
    102.00
  </td>
  <td>
    -
  </td>
  <td>
    108.10
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=597500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    5975.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=597500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    28.30
  </td>
  <td>
    31.86
  </td>
</tr>
 


====================== tr_quote3 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    72.37
  </td>
  <td>
    76.00
  </td>
  <td>
    72.40
  </td>
  <td>
    150.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=600000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6000.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=600000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    45.00
  </td>
  <td>
    35.00
  </td>
  <td>
    38.86
  </td>
</tr>
 


====================== tr_quote4 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    55.77
  </td>
  <td>
    65.20
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=602500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6025.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=602500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    43.10
  </td>
  <td>
    47.27
  </td>
</tr>
 


====================== tr_quote5 ======================
<tr class="bg-mantis-green-50 even" role="row">
  <td>
    40.68
  </td>
  <td>
    43.00
  </td>
  <td>
    -
  </td>
  <td>
    49.50
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=605000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6050.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=605000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    90.00
  </td>
  <td>
    53.20
  </td>
  <td>
    57.19
  </td>
</tr>
 


====================== tr_quote6 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    28.48
  </td>
  <td>
    35.50
  </td>
  <td>
    -
  </td>
  <td>
    40.70
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=607500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6075.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=607500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    76.20
  </td>
  <td>
    63.00
  </td>
  <td>
    69.98
  </td>
</tr>
 


====================== tr_quote7 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    19.16
  </td>
  <td>
    21.00
  </td>
  <td>
    -
  </td>
  <td>
    30.10
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=610000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6100.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=610000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    85.67
  </td>
</tr>
 


====================== tr_quote8 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    11.81
  </td>
  <td>
    18.60
  </td>
  <td>
    -
  </td>
  <td>
    15.60
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=612500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6125.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=612500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    103.32
  </td>
</tr>
 


====================== tr_quote9 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    7.17
  </td>
  <td>
    7.70
  </td>
  <td>
    2.90
  </td>
  <td>
    24.90
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=615000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6150.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=615000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    132.60
  </td>
  <td>
    -
  </td>
  <td>
    123.70
  </td>
</tr>
 


====================== tr_quote10 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    2.42
  </td>
  <td>
    2.80
  </td>
  <td>
    -
  </td>
  <td>
    7.50
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=620000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6200.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=620000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    168.96
  </td>
</tr>
Reply
#6
Thank you very much for your help, i'll try the solutions you gave me.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Any way to remove HTML tags from scraped data? (I want text only) SeBz2020uk 1 152 Nov-02-2020, 08:12 PM
Last Post: Larz60+
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 289 Oct-01-2020, 02:19 PM
Last Post: snippsat
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 445 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Web crawler extracting specific text from HTML lewdow 1 1,054 Jan-03-2020, 11:21 PM
Last Post: snippsat
  Extract text between bold headlines from HTML CostasG 1 568 Aug-31-2019, 10:53 AM
Last Post: snippsat
  Getting a specific text inside an html with soup mathieugrimbert 9 6,276 Jul-10-2019, 12:40 PM
Last Post: mathieugrimbert
  XML Parsing - Find a specific text (ElementTree) TeraX 3 1,526 Oct-09-2018, 09:06 AM
Last Post: TeraX
  Beutifulsoup: how to pick text that's not in HTML tags? pitonas 4 1,576 Oct-08-2018, 01:43 PM
Last Post: pitonas
  XML parsing and generating HTML page Python 3.6 Madhuri 2 2,083 Aug-24-2018, 02:48 PM
Last Post: snippsat
  Decoding html to text string PeterPython 1 1,057 Aug-12-2018, 07:23 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020