Python Forum
Help on parsing simple text on HTML - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Help on parsing simple text on HTML (/thread-23495.html)



Help on parsing simple text on HTML - amaumox - Jan-02-2020

Hello Everyone,

I've been trying to parse an HTML page in order to get a simple price but I can't succeed, I tried to get help from youtube tutorials and from many websites but I need your help.

page_options_cac40 ="https://live.euronext.com/fr/product/index-options/PXA-DPAR"
import bs4
import requests
from bs4 import BeautifulSoup

def get_all_strikes():
    r=requests.get(page_options_cac40)
    page = bs4.BeautifulSoup(r.text,"html.parser")
    
    #<td class=" font-weight-bold">4400.00</td> <--- line with the code and the strike to find
    
    premier_strike = page.find('td', attrs={'class': 'font-weight-bolde'})
    print(premier_strike)
I'd like to get the first strike price which is 4400. The code runs but returns "None"....
Thank you very much in advance.

Happy new year


RE: Help on parsing simple text on HTML - Axel_Erfurt - Jan-02-2020

There is no strike price which is 4400 on this site.

Output:
January 2020 Cours - 02/01/20 Comp. Dernier Achat Vente Strike Achat Vente Dernier Comp. 130.23 - 122.80 150.00 C 5925.00 P - - 21.40 21.71 109.83 121.40 105.00 128.60 C 5950.00 P - - 24.70 26.32 90.37 102.00 - 108.10 C 5975.00 P - - 28.30 31.86 72.37 76.00 72.40 150.00 C 6000.00 P - 45.00 35.00 38.86 55.77 65.20 - - C 6025.00 P - - 43.10 47.27 40.68 43.00 - 49.50 C 6050.00 P - 90.00 53.20 57.19 28.48 35.50 - 40.70 C 6075.00 P - 76.20 63.00 69.98 19.16 21.00 - 30.10 C 6100.00 P - - - 85.67 11.81 18.60 - 15.60 C 6125.00 P - - - 103.32 7.17 7.70 2.90 24.90 C 6150.00 P - 132.60 - 123.70 2.42 2.80 - 7.50 C 6200.00 P - - - 168.96



RE: Help on parsing simple text on HTML - Larz60+ - Jan-02-2020

Also, you will need to use selenium to render the JavaScript, otherwise you will not find your table.


RE: Help on parsing simple text on HTML - amaumox - Jan-02-2020

(Jan-02-2020, 09:08 PM)Axel_Erfurt Wrote: There is no strike price which is 4400 on this site.

Output:
January 2020 Cours - 02/01/20 Comp. Dernier Achat Vente Strike Achat Vente Dernier Comp. 130.23 - 122.80 150.00 C 5925.00 P - - 21.40 21.71 109.83 121.40 105.00 128.60 C 5950.00 P - - 24.70 26.32 90.37 102.00 - 108.10 C 5975.00 P - - 28.30 31.86 72.37 76.00 72.40 150.00 C 6000.00 P - 45.00 35.00 38.86 55.77 65.20 - - C 6025.00 P - - 43.10 47.27 40.68 43.00 - 49.50 C 6050.00 P - 90.00 53.20 57.19 28.48 35.50 - 40.70 C 6075.00 P - 76.20 63.00 69.98 19.16 21.00 - 30.10 C 6100.00 P - - - 85.67 11.81 18.60 - 15.60 C 6125.00 P - - - 103.32 7.17 7.70 2.90 24.90 C 6150.00 P - 132.60 - 123.70 2.42 2.80 - 7.50 C 6200.00 P - - - 168.96

Hey, that's because you've probably displayed the compact mode, try the "All strikes" (just on the right, next to the Maturity selection).

By the way, how the heck did you get this output !!! tell me the magic lol

(Jan-02-2020, 09:09 PM)Larz60+ Wrote: Also, you will need to use selenium to render the JavaScript, otherwise you will not find your table.

Ok... but I have no clear idea about how to do that. How could you know it is a page written in javascript ?
I don't have a lot of knowledge about web programming...

Thank you by the way


RE: Help on parsing simple text on HTML - Larz60+ - Jan-02-2020

Here's a starter for you with selenium
you will need the display script below (place in same directory as main selenium script)
PrettifyPage.py
import requests
import pathlib


class PrettifyPage:
    def __init__(self):
        pass

    def prettify(self, soup, indent):
        pretty_soup = str()
        previous_indent = 0
        for line in soup.prettify().split("\n"):
            current_indent = str(line).find("<")
            if current_indent == -1 or current_indent > previous_indent + 2:
                current_indent = previous_indent + 1
            previous_indent = current_indent
            pretty_soup += self.write_new_line(line, current_indent, indent)
        return pretty_soup

    def write_new_line(self, line, current_indent, desired_indent):
        new_line = ""
        spaces_to_add = (current_indent * desired_indent) - current_indent
        if spaces_to_add > 0:
            for i in range(spaces_to_add):
                new_line += " "		
        new_line += str(line) + "\n"
        return new_line
main script
GetPrices.py:
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import PrettifyPage


class GetPrices:
    def __init__(self):
        self.pp = PrettifyPage.PrettifyPage()
        self.baseurl = "https://live.euronext.com/fr/product/index-options/"

    def start_browser(self):
        caps = webdriver.DesiredCapabilities().FIREFOX
        caps["marionette"] = True
        return webdriver.Firefox(capabilities=caps)

    def stop_browser(self, browser):
        browser.close()

    def get_quote(self, symbol):
        prettify = self.pp.prettify
        driver = self.start_browser()
        driver.get(f"{self.baseurl}{symbol}")
        time.sleep(2)
        source = driver.page_source
        soup = BeautifulSoup(source, 'lxml')
        table = soup.find('table', {'id': 'prices_tables_0'})
        headerinfo = table.thead.tr.find_all('th')
        for n, th in enumerate(headerinfo):
            print(f"\n====================== th_{n} ======================")
            print(f"{prettify(th, 2)}")
        quotes = table.tbody.find_all('tr')
        for n, quote in enumerate(quotes):
            print(f"\n====================== tr_Quote{n} ======================")
            print(f"{prettify(quote, 2)}")
        self.stop_browser(driver)


if __name__ == '__main__':
    gp = GetPrices()
    gp.get_quote('PXA-DPAR')
This does not do anything with the scraped data, other than display it
I leave data manipulation to you
sample output (I put in code tags for scrolling
====================== th_0 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="3" rowspan="1" scope="col">
  Comp.
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_1 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="2" rowspan="1" scope="col">
  Dernier
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_2 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Achat
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_3 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Vente
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_4 ======================
<th class="sorting_disabled" colspan="1" data-priority="1" rowspan="1" scope="col">
</th>
 


====================== th_5 ======================
<th class="sorting_disabled font-weight-bold" colspan="1" data-priority="1" rowspan="1" scope="col">
  Strike
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_6 ======================
<th class="sorting_disabled" colspan="1" data-priority="1" rowspan="1" scope="col">
</th>
 


====================== th_7 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Achat
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_8 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col">
  Vente
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_9 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="2" rowspan="1" scope="col">
  Dernier
  <span class="sort-arrows">
  </span>
</th>
 


====================== th_10 ======================
<th class="text-right sorting_disabled" colspan="1" data-priority="3" rowspan="1" scope="col">
  Comp.
  <span class="sort-arrows">
  </span>
</th>


====================== tr_quote0 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    130.23
  </td>
  <td>
    -
  </td>
  <td>
    122.80
  </td>
  <td>
    150.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=592500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    5925.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=592500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    21.40
  </td>
  <td>
    21.71
  </td>
</tr>
 


====================== tr_quote1 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    109.83
  </td>
  <td>
    121.40
  </td>
  <td>
    105.00
  </td>
  <td>
    128.60
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=595000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    5950.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=595000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    24.70
  </td>
  <td>
    26.32
  </td>
</tr>
 


====================== tr_quote2 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    90.37
  </td>
  <td>
    102.00
  </td>
  <td>
    -
  </td>
  <td>
    108.10
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=597500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    5975.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=597500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    28.30
  </td>
  <td>
    31.86
  </td>
</tr>
 


====================== tr_quote3 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    72.37
  </td>
  <td>
    76.00
  </td>
  <td>
    72.40
  </td>
  <td>
    150.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=600000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6000.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=600000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    45.00
  </td>
  <td>
    35.00
  </td>
  <td>
    38.86
  </td>
</tr>
 


====================== tr_quote4 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    55.77
  </td>
  <td>
    65.20
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=602500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6025.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=602500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    43.10
  </td>
  <td>
    47.27
  </td>
</tr>
 


====================== tr_quote5 ======================
<tr class="bg-mantis-green-50 even" role="row">
  <td>
    40.68
  </td>
  <td>
    43.00
  </td>
  <td>
    -
  </td>
  <td>
    49.50
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=605000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6050.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=605000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    90.00
  </td>
  <td>
    53.20
  </td>
  <td>
    57.19
  </td>
</tr>
 


====================== tr_quote6 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    28.48
  </td>
  <td>
    35.50
  </td>
  <td>
    -
  </td>
  <td>
    40.70
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=607500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6075.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=607500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    76.20
  </td>
  <td>
    63.00
  </td>
  <td>
    69.98
  </td>
</tr>
 


====================== tr_quote7 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    19.16
  </td>
  <td>
    21.00
  </td>
  <td>
    -
  </td>
  <td>
    30.10
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=610000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6100.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=610000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    85.67
  </td>
</tr>
 


====================== tr_quote8 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    11.81
  </td>
  <td>
    18.60
  </td>
  <td>
    -
  </td>
  <td>
    15.60
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=612500&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6125.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=612500&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    103.32
  </td>
</tr>
 


====================== tr_quote9 ======================
<tr class="bg-ui-grey-0 even" role="row">
  <td>
    7.17
  </td>
  <td>
    7.70
  </td>
  <td>
    2.90
  </td>
  <td>
    24.90
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=615000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6150.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=615000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    132.60
  </td>
  <td>
    -
  </td>
  <td>
    123.70
  </td>
</tr>
 


====================== tr_quote10 ======================
<tr class="bg-ui-grey-0 odd" role="row">
  <td>
    2.42
  </td>
  <td>
    2.80
  </td>
  <td>
    -
  </td>
  <td>
    7.50
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=C&amp;sp=620000&amp;md=01-2020">
      C
    </a>
  </td>
  <td class="font-weight-bold">
    6200.00
  </td>
  <td>
    <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&amp;ps=pagesize&amp;pmd=maturitydates&amp;Class_exchange=DPAR&amp;fOrO=O&amp;cOrP=P&amp;sp=620000&amp;md=01-2020">
      P
    </a>
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    -
  </td>
  <td>
    168.96
  </td>
</tr>



RE: Help on parsing simple text on HTML - amaumox - Jan-03-2020

Thank you very much for your help, i'll try the solutions you gave me.