Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Help on parsing simple text on HTML
#1
Hello Everyone,

I've been trying to parse an HTML page in order to get a simple price but I can't succeed, I tried to get help from youtube tutorials and from many websites but I need your help.

page_options_cac40 ="https://live.euronext.com/fr/product/index-options/PXA-DPAR"
import bs4
import requests
from bs4 import BeautifulSoup

def get_all_strikes():
    r=requests.get(page_options_cac40)
    page = bs4.BeautifulSoup(r.text,"html.parser")
    
    #<td class=" font-weight-bold">4400.00</td> <--- line with the code and the strike to find
    
    premier_strike = page.find('td', attrs={'class': 'font-weight-bolde'})
    print(premier_strike)
I'd like to get the first strike price which is 4400. The code runs but returns "None"....
Thank you very much in advance.

Happy new year
Quote
#2
There is no strike price which is 4400 on this site.

Output:
January 2020 Cours - 02/01/20 Comp. Dernier Achat Vente Strike Achat Vente Dernier Comp. 130.23 - 122.80 150.00 C 5925.00 P - - 21.40 21.71 109.83 121.40 105.00 128.60 C 5950.00 P - - 24.70 26.32 90.37 102.00 - 108.10 C 5975.00 P - - 28.30 31.86 72.37 76.00 72.40 150.00 C 6000.00 P - 45.00 35.00 38.86 55.77 65.20 - - C 6025.00 P - - 43.10 47.27 40.68 43.00 - 49.50 C 6050.00 P - 90.00 53.20 57.19 28.48 35.50 - 40.70 C 6075.00 P - 76.20 63.00 69.98 19.16 21.00 - 30.10 C 6100.00 P - - - 85.67 11.81 18.60 - 15.60 C 6125.00 P - - - 103.32 7.17 7.70 2.90 24.90 C 6150.00 P - 132.60 - 123.70 2.42 2.80 - 7.50 C 6200.00 P - - - 168.96
Quote
#3
Also, you will need to use selenium to render the JavaScript, otherwise you will not find your table.
Quote
#4
(Jan-02-2020, 09:08 PM)Axel_Erfurt Wrote: There is no strike price which is 4400 on this site.

Output:
January 2020 Cours - 02/01/20 Comp. Dernier Achat Vente Strike Achat Vente Dernier Comp. 130.23 - 122.80 150.00 C 5925.00 P - - 21.40 21.71 109.83 121.40 105.00 128.60 C 5950.00 P - - 24.70 26.32 90.37 102.00 - 108.10 C 5975.00 P - - 28.30 31.86 72.37 76.00 72.40 150.00 C 6000.00 P - 45.00 35.00 38.86 55.77 65.20 - - C 6025.00 P - - 43.10 47.27 40.68 43.00 - 49.50 C 6050.00 P - 90.00 53.20 57.19 28.48 35.50 - 40.70 C 6075.00 P - 76.20 63.00 69.98 19.16 21.00 - 30.10 C 6100.00 P - - - 85.67 11.81 18.60 - 15.60 C 6125.00 P - - - 103.32 7.17 7.70 2.90 24.90 C 6150.00 P - 132.60 - 123.70 2.42 2.80 - 7.50 C 6200.00 P - - - 168.96

Hey, that's because you've probably displayed the compact mode, try the "All strikes" (just on the right, next to the Maturity selection).

By the way, how the heck did you get this output !!! tell me the magic lol

(Jan-02-2020, 09:09 PM)Larz60+ Wrote: Also, you will need to use selenium to render the JavaScript, otherwise you will not find your table.

Ok... but I have no clear idea about how to do that. How could you know it is a page written in javascript ?
I don't have a lot of knowledge about web programming...

Thank you by the way
Quote
#5
Here's a starter for you with selenium
you will need the display script below (place in same directory as main selenium script)
PrettifyPage.py
import requests
import pathlib


class PrettifyPage:
    def __init__(self):
        pass

    def prettify(self, soup, indent):
        pretty_soup = str()
        previous_indent = 0
        for line in soup.prettify().split("\n"):
            current_indent = str(line).find("<")
            if current_indent == -1 or current_indent > previous_indent + 2:
                current_indent = previous_indent + 1
            previous_indent = current_indent
            pretty_soup += self.write_new_line(line, current_indent, indent)
        return pretty_soup

    def write_new_line(self, line, current_indent, desired_indent):
        new_line = ""
        spaces_to_add = (current_indent * desired_indent) - current_indent
        if spaces_to_add > 0:
            for i in range(spaces_to_add):
                new_line += " "		
        new_line += str(line) + "\n"
        return new_line
main script
GetPrices.py:
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import PrettifyPage


class GetPrices:
    def __init__(self):
        self.pp = PrettifyPage.PrettifyPage()
        self.baseurl = "https://live.euronext.com/fr/product/index-options/"

    def start_browser(self):
        caps = webdriver.DesiredCapabilities().FIREFOX
        caps["marionette"] = True
        return webdriver.Firefox(capabilities=caps)

    def stop_browser(self, browser):
        browser.close()

    def get_quote(self, symbol):
        prettify = self.pp.prettify
        driver = self.start_browser()
        driver.get(f"{self.baseurl}{symbol}")
        time.sleep(2)
        source = driver.page_source
        soup = BeautifulSoup(source, 'lxml')
        table = soup.find('table', {'id': 'prices_tables_0'})
        headerinfo = table.thead.tr.find_all('th')
        for n, th in enumerate(headerinfo):
            print(f"\n====================== th_{n} ======================")
            print(f"{prettify(th, 2)}")
        quotes = table.tbody.find_all('tr')
        for n, quote in enumerate(quotes):
            print(f"\n====================== tr_Quote{n} ======================")
            print(f"{prettify(quote, 2)}")
        self.stop_browser(driver)


if __name__ == '__main__':
    gp = GetPrices()
    gp.get_quote('PXA-DPAR')
This does not do anything with the scraped data, other than display it
I leave data manipulation to you
sample output (I put in code tags for scrolling
====================== th_0 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="3" rowspan="1" scope="col">
  Comp.
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_1 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="2" rowspan="1" scope="col">
  Dernier
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_2 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Achat
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_3 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Vente
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_4 ======================
<th class="sorting_disabled" colspan="1" data-priority="1" rowspan="1" scope="col">
</th>
 


====================== th_5 ======================
<th class="sorting_disabled font-weight-bold" colspan="1" data-priority="1" rowspan="1" scope="col">
  Strike
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_6 ======================
<th class="sorting_disabled" colspan="1" data-priority="1" rowspan="1" scope="col">
</th>
 


====================== th_7 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Achat
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_8 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Vente
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_9 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="2" rowspan="1" scope="col">
  Dernier
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_10 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="3" rowspan="1" scope="col">
  Comp.
  <span class="sort-arrows">
  </span>
</th>


====================== tr_quote0 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    130.23
  </td>
  <td>
    -
  </td>
  <td>
    122.80
  </td>
  <td>
    150.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=592500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    5925.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=592500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    21.40
  </td>
  <td>
    21.71
  </td>
</tr>
 


====================== tr_quote1 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    109.83
  </td>
  <td>
    121.40
  </td>
  <td>
    105.00
  </td>
  <td>
    128.60
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=595000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    5950.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=595000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    24.70
  </td>
  <td>
    26.32
  </td>
</tr>
 


====================== tr_quote2 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    90.37
  </td>
  <td>
    102.00
  </td>
  <td>
    -
  </td>
  <td>
    108.10
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=597500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    5975.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=597500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    28.30
  </td>
  <td>
    31.86
  </td>
</tr>
 


====================== tr_quote3 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    72.37
  </td>
  <td>
    76.00
  </td>
  <td>
    72.40
  </td>
  <td>
    150.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=600000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6000.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=600000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    45.00
  </td>
  <td>
    35.00
  </td>
  <td>
    38.86
  </td>
</tr>
 


====================== tr_quote4 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    55.77
  </td>
  <td>
    65.20
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=602500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6025.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=602500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    43.10
  </td>
  <td>
    47.27
  </td>
</tr>
 


====================== tr_quote5 ======================
<tr class="bg-mantis-green-50 even" role="row">
  <td>
    40.68
  </td>
  <td>
    43.00
  </td>
  <td>
    -
  </td>
  <td>
    49.50
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=605000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6050.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=605000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    90.00
  </td>
  <td>
    53.20
  </td>
  <td>
    57.19
  </td>
</tr>
 


====================== tr_quote6 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    28.48
  </td>
  <td>
    35.50
  </td>
  <td>
    -
  </td>
  <td>
    40.70
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=607500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6075.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=607500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    76.20
  </td>
  <td>
    63.00
  </td>
  <td>
    69.98
  </td>
</tr>
 


====================== tr_quote7 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    19.16
  </td>
  <td>
    21.00
  </td>
  <td>
    -
  </td>
  <td>
    30.10
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=610000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6100.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=610000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    85.67
  </td>
</tr>
 


====================== tr_quote8 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    11.81
  </td>
  <td>
    18.60
  </td>
  <td>
    -
  </td>
  <td>
    15.60
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=612500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6125.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=612500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    103.32
  </td>
</tr>
 


====================== tr_quote9 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    7.17
  </td>
  <td>
    7.70
  </td>
  <td>
    2.90
  </td>
  <td>
    24.90
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=615000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6150.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=615000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    132.60
  </td>
  <td>
    -
  </td>
  <td>
    123.70
  </td>
</tr>
 


====================== tr_quote10 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    2.42
  </td>
  <td>
    2.80
  </td>
  <td>
    -
  </td>
  <td>
    7.50
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=620000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6200.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=620000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    168.96
  </td>
</tr>
apollo likes this post
Quote
#6
Thank you very much for your help, i'll try the solutions you gave me.
apollo likes this post
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 239 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Web crawler extracting specific text from HTML lewdow 1 834 Jan-03-2020, 11:21 PM
Last Post: snippsat
  Extract text between bold headlines from HTML CostasG 1 425 Aug-31-2019, 10:53 AM
Last Post: snippsat
  Getting a specific text inside an html with soup mathieugrimbert 9 5,312 Jul-10-2019, 12:40 PM
Last Post: mathieugrimbert
  XML Parsing - Find a specific text (ElementTree) TeraX 3 1,267 Oct-09-2018, 09:06 AM
Last Post: TeraX
  Beutifulsoup: how to pick text that's not in HTML tags? pitonas 4 1,272 Oct-08-2018, 01:43 PM
Last Post: pitonas
  XML parsing and generating HTML page Python 3.6 Madhuri 2 1,627 Aug-24-2018, 02:48 PM
Last Post: snippsat
  Decoding html to text string PeterPython 1 849 Aug-12-2018, 07:23 PM
Last Post: Larz60+
  Problem parsing website html file thefpgarace 2 1,151 May-01-2018, 11:09 AM
Last Post: Standard_user
  html to text problem Kyle 4 2,264 Apr-27-2018, 09:02 PM
Last Post: snippsat

Forum Jump:


Users browsing this thread: 1 Guest(s)