Help on parsing simple text on HTML - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Help on parsing simple text on HTML (/thread-23495.html) |
Help on parsing simple text on HTML - amaumox - Jan-02-2020 Hello Everyone, I've been trying to parse an HTML page in order to get a simple price but I can't succeed, I tried to get help from youtube tutorials and from many websites but I need your help. page_options_cac40 ="https://live.euronext.com/fr/product/index-options/PXA-DPAR" import bs4 import requests from bs4 import BeautifulSoup def get_all_strikes(): r=requests.get(page_options_cac40) page = bs4.BeautifulSoup(r.text,"html.parser") #<td class=" font-weight-bold">4400.00</td> <--- line with the code and the strike to find premier_strike = page.find('td', attrs={'class': 'font-weight-bolde'}) print(premier_strike)I'd like to get the first strike price which is 4400. The code runs but returns "None".... Thank you very much in advance. Happy new year RE: Help on parsing simple text on HTML - Axel_Erfurt - Jan-02-2020 There is no strike price which is 4400 on this site.
RE: Help on parsing simple text on HTML - Larz60+ - Jan-02-2020 Also, you will need to use selenium to render the JavaScript, otherwise you will not find your table. RE: Help on parsing simple text on HTML - amaumox - Jan-02-2020 (Jan-02-2020, 09:08 PM)Axel_Erfurt Wrote: There is no strike price which is 4400 on this site. Hey, that's because you've probably displayed the compact mode, try the "All strikes" (just on the right, next to the Maturity selection). By the way, how the heck did you get this output !!! tell me the magic lol (Jan-02-2020, 09:09 PM)Larz60+ Wrote: Also, you will need to use selenium to render the JavaScript, otherwise you will not find your table. Ok... but I have no clear idea about how to do that. How could you know it is a page written in javascript ? I don't have a lot of knowledge about web programming... Thank you by the way RE: Help on parsing simple text on HTML - Larz60+ - Jan-02-2020 Here's a starter for you with selenium you will need the display script below (place in same directory as main selenium script) PrettifyPage.py import requests import pathlib class PrettifyPage: def __init__(self): pass def prettify(self, soup, indent): pretty_soup = str() previous_indent = 0 for line in soup.prettify().split("\n"): current_indent = str(line).find("<") if current_indent == -1 or current_indent > previous_indent + 2: current_indent = previous_indent + 1 previous_indent = current_indent pretty_soup += self.write_new_line(line, current_indent, indent) return pretty_soup def write_new_line(self, line, current_indent, desired_indent): new_line = "" spaces_to_add = (current_indent * desired_indent) - current_indent if spaces_to_add > 0: for i in range(spaces_to_add): new_line += " " new_line += str(line) + "\n" return new_linemain script GetPrices.py: from selenium import webdriver import time from bs4 import BeautifulSoup import PrettifyPage class GetPrices: def __init__(self): self.pp = PrettifyPage.PrettifyPage() self.baseurl = "https://live.euronext.com/fr/product/index-options/" def start_browser(self): caps = webdriver.DesiredCapabilities().FIREFOX caps["marionette"] = True return webdriver.Firefox(capabilities=caps) def stop_browser(self, browser): browser.close() def get_quote(self, symbol): prettify = self.pp.prettify driver = self.start_browser() driver.get(f"{self.baseurl}{symbol}") time.sleep(2) source = driver.page_source soup = BeautifulSoup(source, 'lxml') table = soup.find('table', {'id': 'prices_tables_0'}) headerinfo = table.thead.tr.find_all('th') for n, th in enumerate(headerinfo): print(f"\n====================== th_{n} ======================") print(f"{prettify(th, 2)}") quotes = table.tbody.find_all('tr') for n, quote in enumerate(quotes): print(f"\n====================== tr_Quote{n} ======================") print(f"{prettify(quote, 2)}") self.stop_browser(driver) if __name__ == '__main__': gp = GetPrices() gp.get_quote('PXA-DPAR')This does not do anything with the scraped data, other than display it I leave data manipulation to you sample output (I put in code tags for scrolling ====================== th_0 ====================== <th class="text-right sorting_disabled" colspan="1" data-priority="3" rowspan="1" scope="col"> Comp. <span class="sort-arrows"> </span> </th> ====================== th_1 ====================== <th class="text-right sorting_disabled" colspan="1" data-priority="2" rowspan="1" scope="col"> Dernier <span class="sort-arrows"> </span> </th> ====================== th_2 ====================== <th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col"> Achat <span class="sort-arrows"> </span> </th> ====================== th_3 ====================== <th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col"> Vente <span class="sort-arrows"> </span> </th> ====================== th_4 ====================== <th class="sorting_disabled" colspan="1" data-priority="1" rowspan="1" scope="col"> </th> ====================== th_5 ====================== <th class="sorting_disabled font-weight-bold" colspan="1" data-priority="1" rowspan="1" scope="col"> Strike <span class="sort-arrows"> </span> </th> ====================== th_6 ====================== <th class="sorting_disabled" colspan="1" data-priority="1" rowspan="1" scope="col"> </th> ====================== th_7 ====================== <th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col"> Achat <span class="sort-arrows"> </span> </th> ====================== th_8 ====================== <th class="text-right sorting_disabled" colspan="1" data-priority="4" rowspan="1" scope="col"> Vente <span class="sort-arrows"> </span> </th> ====================== th_9 ====================== <th class="text-right sorting_disabled" colspan="1" data-priority="2" rowspan="1" scope="col"> Dernier <span class="sort-arrows"> </span> </th> ====================== th_10 ====================== <th class="text-right sorting_disabled" colspan="1" data-priority="3" rowspan="1" scope="col"> Comp. <span class="sort-arrows"> </span> </th> ====================== tr_quote0 ====================== <tr class="bg-ui-grey-0 odd" role="row"> <td> 130.23 </td> <td> - </td> <td> 122.80 </td> <td> 150.00 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=C&sp=592500&md=01-2020"> C </a> </td> <td class="font-weight-bold"> 5925.00 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=P&sp=592500&md=01-2020"> P </a> </td> <td> - </td> <td> - </td> <td> 21.40 </td> <td> 21.71 </td> </tr> ====================== tr_quote1 ====================== <tr class="bg-ui-grey-0 even" role="row"> <td> 109.83 </td> <td> 121.40 </td> <td> 105.00 </td> <td> 128.60 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=C&sp=595000&md=01-2020"> C </a> </td> <td class="font-weight-bold"> 5950.00 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=P&sp=595000&md=01-2020"> P </a> </td> <td> - </td> <td> - </td> <td> 24.70 </td> <td> 26.32 </td> </tr> ====================== tr_quote2 ====================== <tr class="bg-ui-grey-0 odd" role="row"> <td> 90.37 </td> <td> 102.00 </td> <td> - </td> <td> 108.10 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=C&sp=597500&md=01-2020"> C </a> </td> <td class="font-weight-bold"> 5975.00 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=P&sp=597500&md=01-2020"> P </a> </td> <td> - </td> <td> - </td> <td> 28.30 </td> <td> 31.86 </td> </tr> ====================== tr_quote3 ====================== <tr class="bg-ui-grey-0 even" role="row"> <td> 72.37 </td> <td> 76.00 </td> <td> 72.40 </td> <td> 150.00 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=C&sp=600000&md=01-2020"> C </a> </td> <td class="font-weight-bold"> 6000.00 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=P&sp=600000&md=01-2020"> P </a> </td> <td> - </td> <td> 45.00 </td> <td> 35.00 </td> <td> 38.86 </td> </tr> ====================== tr_quote4 ====================== <tr class="bg-ui-grey-0 odd" role="row"> <td> 55.77 </td> <td> 65.20 </td> <td> - </td> <td> - </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=C&sp=602500&md=01-2020"> C </a> </td> <td class="font-weight-bold"> 6025.00 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=P&sp=602500&md=01-2020"> P </a> </td> <td> - </td> <td> - </td> <td> 43.10 </td> <td> 47.27 </td> </tr> ====================== tr_quote5 ====================== <tr class="bg-mantis-green-50 even" role="row"> <td> 40.68 </td> <td> 43.00 </td> <td> - </td> <td> 49.50 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=C&sp=605000&md=01-2020"> C </a> </td> <td class="font-weight-bold"> 6050.00 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=P&sp=605000&md=01-2020"> P </a> </td> <td> - </td> <td> 90.00 </td> <td> 53.20 </td> <td> 57.19 </td> </tr> ====================== tr_quote6 ====================== <tr class="bg-ui-grey-0 odd" role="row"> <td> 28.48 </td> <td> 35.50 </td> <td> - </td> <td> 40.70 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=C&sp=607500&md=01-2020"> C </a> </td> <td class="font-weight-bold"> 6075.00 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=P&sp=607500&md=01-2020"> P </a> </td> <td> - </td> <td> 76.20 </td> <td> 63.00 </td> <td> 69.98 </td> </tr> ====================== tr_quote7 ====================== <tr class="bg-ui-grey-0 even" role="row"> <td> 19.16 </td> <td> 21.00 </td> <td> - </td> <td> 30.10 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=C&sp=610000&md=01-2020"> C </a> </td> <td class="font-weight-bold"> 6100.00 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=P&sp=610000&md=01-2020"> P </a> </td> <td> - </td> <td> - </td> <td> - </td> <td> 85.67 </td> </tr> ====================== tr_quote8 ====================== <tr class="bg-ui-grey-0 odd" role="row"> <td> 11.81 </td> <td> 18.60 </td> <td> - </td> <td> 15.60 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=C&sp=612500&md=01-2020"> C </a> </td> <td class="font-weight-bold"> 6125.00 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=P&sp=612500&md=01-2020"> P </a> </td> <td> - </td> <td> - </td> <td> - </td> <td> 103.32 </td> </tr> ====================== tr_quote9 ====================== <tr class="bg-ui-grey-0 even" role="row"> <td> 7.17 </td> <td> 7.70 </td> <td> 2.90 </td> <td> 24.90 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=C&sp=615000&md=01-2020"> C </a> </td> <td class="font-weight-bold"> 6150.00 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=P&sp=615000&md=01-2020"> P </a> </td> <td> - </td> <td> 132.60 </td> <td> - </td> <td> 123.70 </td> </tr> ====================== tr_quote10 ====================== <tr class="bg-ui-grey-0 odd" role="row"> <td> 2.42 </td> <td> 2.80 </td> <td> - </td> <td> 7.50 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=C&sp=620000&md=01-2020"> C </a> </td> <td class="font-weight-bold"> 6200.00 </td> <td> <a class="text-ui-picton-blue font-weight-bold" href="/fr/product/index-options/PXA-DPAR/instrument?Class_symbol=PXA&ps=pagesize&pmd=maturitydates&Class_exchange=DPAR&fOrO=O&cOrP=P&sp=620000&md=01-2020"> P </a> </td> <td> - </td> <td> - </td> <td> - </td> <td> 168.96 </td> </tr> RE: Help on parsing simple text on HTML - amaumox - Jan-03-2020 Thank you very much for your help, i'll try the solutions you gave me. |