getting financial data from yahoo finance - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: getting financial data from yahoo finance (/thread-27068.html) |
getting financial data from yahoo finance - asiaphone12 - May-25-2020 hello, I have problem when scraping data from yahoo finance. I search the forum but all I find is about stock data, not financial data. I want to get the Income Statement, Balance Sheet and Cash Flow for valuation. here is the code (credit to Matt Button): from datetime import datetime import lxml from lxml import html import requests import numpy as np import pandas as pd symbol = 'INDF.JK' url = 'https://finance.yahoo.com/quote/' + symbol + '/balance-sheet?p=' + symbol # Set up the request headers. headers = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3', 'Accept-Encoding': 'gzip, deflate, br', 'Accept-Language': 'en-US,en;q=0.9', 'Cache-Control': 'max-age=0', 'Pragma': 'no-cache', 'Referrer': 'https://google.com', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36' } # Fetching the page. page = requests.get(url, headers) # Parse the page with LXML. tree = html.fromstring(page.content) table_rows = tree.xpath("//div[contains(@class, 'D(tbr)')]") # Ensure that some table rows are found. assert len(table_rows) > 0 parsed_rows = [] for table_row in table_rows: parsed_row = [] # ~ print(table_row) el = table_row.xpath("./div") none_count = 0 for rs in el: try: (text,) = rs.xpath('.//span/text()[1]') parsed_row.append(text) except ValueError: parsed_row.append(np.NaN) none_count += 1 if (none_count < 4): parsed_rows.append(parsed_row) df = pd.DataFrame(parsed_rows) print(df)it give this output: it did not get the complete data such as cash and cash equivalent, inventory, and so on.when I try to download the web page, and then parse it, it give complete data. from datetime import datetime import lxml from lxml import html import requests import numpy as np import pandas as pd import os os.chdir(r'D:\ahmad\python\web') with open('INDF.JK 6,425.00 -325.00 -4.81% Indofood Sukses Makmur Tbk. - Yahoo Finance.html') as a: page = a.read() tree = html.fromstring(page) table_rows = tree.xpath("//div[contains(@class, 'D(tbr)')]") assert len(table_rows) > 0 parsed_rows = [] for table_row in table_rows: parsed_row = [] # ~ print(table_row) el = table_row.xpath("./div") none_count = 0 for rs in el: try: (text,) = rs.xpath('.//span/text()[1]') parsed_row.append(text) except ValueError: parsed_row.append(np.NaN) none_count += 1 if (none_count < 4): parsed_rows.append(parsed_row) df = pd.DataFrame(parsed_rows) print(df)the output: how to get these complete data without downloading the complete page?by the way, it still didn't get the complete data from total non-current asset and below such as land, building, machinery, etc. how to get any of that? RE: getting financial data from yahoo finance - asiaphone12 - May-26-2020 now I found the problem but I haven't found the solution. the problem is the page have button. when the button not expanded, it give: div class="" data-test="fin-row" data-reactid="66" if the button is expanded, it give: div class="rw-expnded" data-test="fin-row" data-reactid="66" I try to change class value from "" to "rw-expnded" but to no avail. I can find the element with xpath, but how to change the class value? can you give me pointer? my code: import lxml from lxml import html from lxml.html import fromstring, tostring #opening html file filename = 'INDF.JK 6,425.00 -325.00 -4.81% Indofood Sukses Makmur Tbk. - Yahoo Finance.html' with open(filename) as a: page = a.read() #parsing the file tree = html.fromstring(page) #search div element, data-set attribute with fin-row value buttons = tree.xpath("//div[contains(@data-test, 'fin-row')]")what to do for changing class value (class = "" into class = "rw-expnded")? RE: getting financial data from yahoo finance - snippsat - May-26-2020 (May-26-2020, 02:58 PM)asiaphone12 Wrote: I can find the element with xpath, but how to change the class value? can you give me pointer?You will need Selenium for this. Now are there are the library like yfinance that work after API change from Yahoo. RE: getting financial data from yahoo finance - asiaphone12 - May-27-2020 (May-26-2020, 04:15 PM)snippsat Wrote: You will need Selenium for this. I have tried yfinance, when I use it to retrieve financial data in Indonesia Stock Exchange, it return empty data. that's why I want to create my own script I have tried lxml and bs4, I'll try using selenium. it's kinda fun to develop my own program I learn new things everyday. atleast I have something to do when got stuck in home lol RE: getting financial data from yahoo finance - asiaphone12 - May-27-2020 I found how to click the toggle button with selenium. but all I can click is just the first level of button. it have button in button up to 4 level. my code: from selenium import webdriver from selenium.webdriver.common.by import By #file path access local html filename1 = r'D:\ahmad\python\web\INDF.JK 6,425.00 -325.00 -4.81% Indofood Sukses Makmur Tbk. - Yahoo Finance.html' #opening file in firefox browser driver = webdriver.Firefox() driver.get("file:\\" + filename1) #accessing toggle button level level1 = driver.find_elements(By.XPATH, "//button[contains(@class, 'tgglBtn')]") level2 = driver.find_elements(By.XPATH, "//button[contains(@class, 'tgglBtn')]//button[contains(@class, 'tgglBtn')]") level3 = driver.find_elements(By.XPATH, "//button[contains(@class, 'tgglBtn')]//button[contains(@class, 'tgglBtn')]//button[contains(@class, 'tgglBtn')]") level4 = driver.find_elements(By.XPATH, "//button[contains(@class, 'tgglBtn')]//button[contains(@class, 'tgglBtn')]//button[contains(@class, 'tgglBtn')]//button[contains(@class, 'tgglBtn')]") #clicking all the button for elemlevel1 in level1: elemlevel1.click() for elemlevel2 in level2: elemlevel2.click() for elemlevel3 in level3: elemlevel3.click() for elemlevel4 in level4: elemlevel4.click()it can click the level 1 button, but for the next level didn't get clicked. how to click those buttons? RE: getting financial data from yahoo finance - snippsat - May-27-2020 (May-27-2020, 10:32 AM)asiaphone12 Wrote: it can click the level 1 button, but for the next level didn't get clicked. how to click those buttons?There is an Expand All button/link,this open all.Hmm are you able to run this from local html? There is lot going on that a local html may not have access to eg JavaScript/Ajax/JSON. Here is test,so i most click a accept button to get in,then Expand All .Now can try to get value that shown in the Expand layout. from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import time #--| Setup options = Options() #options.add_argument("--headless") #options.add_argument('--disable-gpu') #options.add_argument('--log-level=3') browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options) #--| Parse or automation browser.get('https://finance.yahoo.com/quote/INDF.JK/balance-sheet?p=INDF.JK') time.sleep(3) accept_button = browser.find_elements_by_css_selector('#consent-page > div > div > div > div.wizard-footer > div > form > button.btn.primary') accept_button[0].click() time.sleep(3) expand = browser.find_elements_by_xpath('//*[@id="Col1-1-Financials-Proxy"]/section/div[2]/button') expand[0].click() # Example send source to BS for parse soup = BeautifulSoup(browser.page_source, 'lxml') price = soup.select_one('#Col1-1-Financials-Proxy > section > div.Pos\(r\) > div.W\(100\%\).Whs\(nw\).Ovx\(a\).BdT.Bdtc\(\$seperatorColor\) > div.M\(0\).Whs\(n\).BdEnd.Bdc\(\$seperatorColor\).D\(itb\) > div.D\(tbrg\) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div.D\(tbr\).fi-row.Bgc\(\$hoverBgColor\)\:h > div:nth-child(2) > span') print(price.text)
RE: getting financial data from yahoo finance - asiaphone12 - May-28-2020 (May-27-2020, 03:20 PM)snippsat Wrote: There is an OH MY GOD!!! I DIDN'T SEE THAT!!! look's like I missed that lol I can run this file because I download it in complete format. so it can work offline. I use mobile hotspot for access internet, so I have a hard time to access it online. the waiting time is long, so I using file in local html. thanks for the pointer, my script have completed my code: from selenium import webdriver from selenium.webdriver.common.by import By from time import sleep from lxml import html import lxml import numpy as np import pandas as pd #file path access local html filename1 = r'D:\ahmad\python\web\INDF.JK.html' #opening file in firefox browser driver = webdriver.Firefox() driver.get("file:\\" + filename1) sleep(5) #clicking "Expand All" btnclick = driver.find_elements(By.XPATH, "//*[@id='Col1-1-Financials-Proxy']/section/div[2]/button") btnclick[0].click() #parsing into lxml tree = html.fromstring(driver.page_source) #searching table financial data table_rows = tree.xpath("//div[contains(@class, 'D(tbr)')]") # Ensure that some table rows are found assert len(table_rows) > 0 parsed_rows = [] for table_row in table_rows: parsed_row = [] el = table_row.xpath("./div") none_count = 0 for rs in el: try: (text,) = rs.xpath('.//span/text()[1]') parsed_row.append(text) except ValueError: parsed_row.append(np.NaN) none_count += 1 if (none_count < 4): parsed_rows.append(parsed_row) df = pd.DataFrame(parsed_rows) print(df)the result is: thanks for the guidance
RE: getting financial data from yahoo finance - mick_g - Jun-15-2020 Thanks for your nice solution. Do you know the way to get it with expanded and quartely view at the same time? |