Python Forum
getting financial data from yahoo finance
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
getting financial data from yahoo finance
#1
hello, I have problem when scraping data from yahoo finance. I search the forum but all I find is about stock data, not financial data. I want to get the Income Statement, Balance Sheet and Cash Flow for valuation.

here is the code (credit to Matt Button):
from datetime import datetime
import lxml
from lxml import html
import requests
import numpy as np
import pandas as pd

symbol = 'INDF.JK'

url = 'https://finance.yahoo.com/quote/' + symbol + '/balance-sheet?p=' + symbol

# Set up the request headers.
headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'Cache-Control': 'max-age=0',
    'Pragma': 'no-cache',
    'Referrer': 'https://google.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'
}

# Fetching the page.
page = requests.get(url, headers)

# Parse the page with LXML.
tree = html.fromstring(page.content)


table_rows = tree.xpath("//div[contains(@class, 'D(tbr)')]")


# Ensure that some table rows are found.
assert len(table_rows) > 0

parsed_rows = []

for table_row in table_rows:
    parsed_row = []
    # ~ print(table_row)
    el = table_row.xpath("./div")
    none_count = 0
    
    for rs in el:
        try:
            (text,) = rs.xpath('.//span/text()[1]')
            parsed_row.append(text)
        except ValueError:
            parsed_row.append(np.NaN)
            none_count += 1

    if (none_count < 4):
        parsed_rows.append(parsed_row)

df = pd.DataFrame(parsed_rows)
print(df)
it give this output:
Output:
0 1 2 3 4 0 Breakdown 12/31/2019 12/31/2018 12/31/2017 12/31/2016 1 Total Assets 96,198,559,000 96,537,796,000 87,939,488,000 82,174,515,000 2 Total Liabilities Net Minority Interest 41,996,071,000 46,620,996,000 41,182,764,000 38,233,092,000 3 Total Equity Gross Minority Interest 54,202,488,000 49,916,800,000 46,756,724,000 43,941,423,000 4 Total Capitalization 46,732,924,000 41,103,855,000 42,785,937,000 40,862,141,000 5 Common Stock Equity 37,777,948,000 33,614,280,000 31,178,844,000 28,974,286,000 6 Net Tangible Assets 31,461,529,000 27,157,067,000 25,379,979,000 22,667,765,000 7 Working Capital 6,716,583,000 2,068,516,000 10,877,636,000 9,766,002,000 8 Invested Capital 60,755,105,000 63,341,015,000 55,496,540,000 51,385,909,000 9 Tangible Book Value 31,461,529,000 27,157,067,000 25,379,979,000 22,667,765,000 10 Total Debt 22,977,157,000 29,726,735,000 24,317,696,000 22,411,623,000 11 Net Debt 9,232,039,000 20,917,482,000 10,627,698,000 9,049,387,000 12 Share Issued 8,780,427 8,780,427 8,780,427 8,780,427 13 Ordinary Shares Number 8,780,427 8,780,427 8,780,427 8,780,427 ------------------ (program exited with code: 0) Press any key to continue . . .
it did not get the complete data such as cash and cash equivalent, inventory, and so on.
when I try to download the web page, and then parse it, it give complete data.

from datetime import datetime
import lxml
from lxml import html
import requests
import numpy as np
import pandas as pd
import os

os.chdir(r'D:\ahmad\python\web')
with open('INDF.JK 6,425.00 -325.00 -4.81% Indofood Sukses Makmur Tbk. - Yahoo Finance.html') as a:
	page = a.read()

tree = html.fromstring(page)

table_rows = tree.xpath("//div[contains(@class, 'D(tbr)')]")

assert len(table_rows) > 0

parsed_rows = []

for table_row in table_rows:
    parsed_row = []
    # ~ print(table_row)
    el = table_row.xpath("./div")
    none_count = 0
    
    for rs in el:
        try:
            (text,) = rs.xpath('.//span/text()[1]')
            parsed_row.append(text)
        except ValueError:
            parsed_row.append(np.NaN)
            none_count += 1

    if (none_count < 4):
        parsed_rows.append(parsed_row)

df = pd.DataFrame(parsed_rows)
print(df)
the output:
Output:
0 1 2 3 4 0 Breakdown 12/30/2019 12/30/2018 12/30/2017 12/30/2016 1 Total Assets 96,198,559,000 96,537,796,000 87,939,488,000 82,174,515,000 2 Current Assets 31,403,445,000 33,272,618,000 32,515,399,000 28,985,443,000 3 Cash, Cash Equivalents & Short Term Investments 13,800,610,000 12,928,189,000 14,490,157,000 13,896,374,000 4 Cash And Cash Equivalents 13,745,118,000 8,809,253,000 13,689,998,000 13,362,236,000 5 Cash 4,714,869,000 4,489,205,000 3,564,920,000 4,251,630,000 6 Cash Equivalents 9,030,249,000 4,320,048,000 10,125,078,000 9,110,606,000 7 Other Short Term Investments 55,492,000 4,118,936,000 800,159,000 534,138,000 8 Inventory 9,658,705,000 11,644,156,000 9,690,981,000 8,469,821,000 9 Prepaid Assets 1,262,100,000 1,610,941,000 1,275,500,000 1,233,831,000 10 Assets Held for Sale Current NaN NaN NaN 0 11 Other Current Assets 717,620,000 516,656,000 205,876,000 180,900,000 12 Total non-current assets 64,795,114,000 63,265,178,000 55,424,089,000 53,189,072,000 13 Total Liabilities Net Minority Interest 41,996,071,000 46,620,996,000 41,182,764,000 38,233,092,000 14 Total Equity Gross Minority Interest 54,202,488,000 49,916,800,000 46,756,724,000 43,941,423,000 15 Total Capitalization 46,732,924,000 41,103,855,000 42,785,937,000 40,862,141,000 16 Common Stock Equity 37,777,948,000 33,614,280,000 31,178,844,000 28,974,286,000 17 Net Tangible Assets 31,461,529,000 27,157,067,000 25,379,979,000 22,667,765,000 18 Working Capital 6,716,583,000 2,068,516,000 10,877,636,000 9,766,002,000 19 Invested Capital 60,755,105,000 63,341,015,000 55,496,540,000 51,385,909,000 20 Tangible Book Value 31,461,529,000 27,157,067,000 25,379,979,000 22,667,765,000 21 Total Debt 22,977,157,000 29,726,735,000 24,317,696,000 22,411,623,000 22 Net Debt 9,232,039,000 20,917,482,000 10,627,698,000 9,049,387,000 23 Share Issued 8,780,427 8,780,427 8,780,427 8,780,427 24 Ordinary Shares Number 8,780,427 8,780,427 8,780,427 8,780,427 ------------------ (program exited with code: 0) Press any key to continue . . .
how to get these complete data without downloading the complete page?
by the way, it still didn't get the complete data from total non-current asset and below such as land, building, machinery, etc.
how to get any of that?
Reply
#2
now I found the problem but I haven't found the solution. the problem is the page have button. when the button not expanded, it give:
div class="" data-test="fin-row" data-reactid="66"

if the button is expanded, it give:
div class="rw-expnded" data-test="fin-row" data-reactid="66"


I try to change class value from "" to "rw-expnded" but to no avail.

I can find the element with xpath, but how to change the class value? can you give me pointer?

my code:
import lxml
from lxml import html
from lxml.html import fromstring, tostring

#opening html file
filename = 'INDF.JK 6,425.00 -325.00 -4.81% Indofood Sukses Makmur Tbk. - Yahoo Finance.html'
with open(filename) as a:
	page = a.read()

#parsing the file
tree = html.fromstring(page)

#search div element, data-set attribute with fin-row value
buttons = tree.xpath("//div[contains(@data-test, 'fin-row')]")
what to do for changing class value (class = "" into class = "rw-expnded")?
Reply
#3
(May-26-2020, 02:58 PM)asiaphone12 Wrote: I can find the element with xpath, but how to change the class value? can you give me pointer?
You will need Selenium for this.
Now are there are the library like yfinance that work after API change from Yahoo.
Reply
#4
(May-26-2020, 04:15 PM)snippsat Wrote: You will need Selenium for this.
Now are there are the library like yfinance that work after API change from Yahoo.

I have tried yfinance, when I use it to retrieve financial data in Indonesia Stock Exchange, it return empty data. that's why I want to create my own script Big Grin

I have tried lxml and bs4, I'll try using selenium.

it's kinda fun to develop my own program Dance

I learn new things everyday. atleast I have something to do when got stuck in home lol
Reply
#5
I found how to click the toggle button with selenium. but all I can click is just the first level of button. it have button in button up to 4 level.

my code:
from selenium import webdriver
from selenium.webdriver.common.by import By

#file path access local html
filename1 = r'D:\ahmad\python\web\INDF.JK 6,425.00 -325.00 -4.81% Indofood Sukses Makmur Tbk. - Yahoo Finance.html' 

#opening file in firefox browser
driver = webdriver.Firefox()
driver.get("file:\\" + filename1)

#accessing toggle button level
level1 = driver.find_elements(By.XPATH, "//button[contains(@class, 'tgglBtn')]")
level2 = driver.find_elements(By.XPATH, "//button[contains(@class, 'tgglBtn')]//button[contains(@class, 'tgglBtn')]")
level3 = driver.find_elements(By.XPATH, "//button[contains(@class, 'tgglBtn')]//button[contains(@class, 'tgglBtn')]//button[contains(@class, 'tgglBtn')]")
level4 = driver.find_elements(By.XPATH, "//button[contains(@class, 'tgglBtn')]//button[contains(@class, 'tgglBtn')]//button[contains(@class, 'tgglBtn')]//button[contains(@class, 'tgglBtn')]")

#clicking all the button
for elemlevel1 in level1:
	elemlevel1.click()

for elemlevel2 in level2:
	elemlevel2.click()

for elemlevel3 in level3:
	elemlevel3.click()

for elemlevel4 in level4:
	elemlevel4.click()
it can click the level 1 button, but for the next level didn't get clicked. how to click those buttons?
[Image: Screenshot-2020-05-27-Indofood-Sukses-Ma...inance.png]
Reply
#6
(May-27-2020, 10:32 AM)asiaphone12 Wrote: it can click the level 1 button, but for the next level didn't get clicked. how to click those buttons?
There is an Expand All button/link,this open all.

Hmm are you able to run this from local html?
There is lot going on that a local html may not have access to eg JavaScript/Ajax/JSON.
Here is test,so i most click a accept button to get in,then Expand All.
Now can try to get value that shown in the Expand layout.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

#--| Setup
options = Options()
#options.add_argument("--headless")
#options.add_argument('--disable-gpu')
#options.add_argument('--log-level=3')
browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options)
#--| Parse or automation
browser.get('https://finance.yahoo.com/quote/INDF.JK/balance-sheet?p=INDF.JK')
time.sleep(3)
accept_button = browser.find_elements_by_css_selector('#consent-page > div > div > div > div.wizard-footer > div > form > button.btn.primary')
accept_button[0].click()
time.sleep(3)
expand = browser.find_elements_by_xpath('//*[@id="Col1-1-Financials-Proxy"]/section/div[2]/button')
expand[0].click()

# Example send source to BS for parse
soup = BeautifulSoup(browser.page_source, 'lxml')
price = soup.select_one('#Col1-1-Financials-Proxy > section > div.Pos\(r\) > div.W\(100\%\).Whs\(nw\).Ovx\(a\).BdT.Bdtc\(\$seperatorColor\) > div.M\(0\).Whs\(n\).BdEnd.Bdc\(\$seperatorColor\).D\(itb\) > div.D\(tbrg\) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div.D\(tbr\).fi-row.Bgc\(\$hoverBgColor\)\:h > div:nth-child(2) > span')
print(price.text)
Output:
4,714,869,000
Reply
#7
(May-27-2020, 03:20 PM)snippsat Wrote: There is an Expand All button/link,this open all.

OH MY GOD!!! I DIDN'T SEE THAT!!!

look's like I missed that lol Wall

I can run this file because I download it in complete format. so it can work offline. I use mobile hotspot for access internet, so I have a hard time to access it online. the waiting time is long, so I using file in local html.

thanks for the pointer, my script have completed Heart Heart Heart

my code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep
from lxml import html
import lxml
import numpy as np
import pandas as pd

#file path access local html
filename1 = r'D:\ahmad\python\web\INDF.JK.html' 

#opening file in firefox browser
driver = webdriver.Firefox()
driver.get("file:\\" + filename1)
sleep(5)

#clicking "Expand All"
btnclick = driver.find_elements(By.XPATH, "//*[@id='Col1-1-Financials-Proxy']/section/div[2]/button")
btnclick[0].click()

#parsing into lxml
tree = html.fromstring(driver.page_source)

#searching table financial data
table_rows = tree.xpath("//div[contains(@class, 'D(tbr)')]")

# Ensure that some table rows are found
assert len(table_rows) > 0

parsed_rows = []

for table_row in table_rows:
    parsed_row = []
    el = table_row.xpath("./div")
    
    none_count = 0
    
    for rs in el:
        try:
            (text,) = rs.xpath('.//span/text()[1]')
            parsed_row.append(text)
        except ValueError:
            parsed_row.append(np.NaN)
            none_count += 1

    if (none_count < 4):
        parsed_rows.append(parsed_row)

df = pd.DataFrame(parsed_rows)
print(df)
the result is:
Output:
0 1 2 3 4 0 Breakdown 12/30/2019 12/30/2018 12/30/2017 12/30/2016 1 Total Assets 96,198,559,000 96,537,796,000 87,939,488,000 82,174,515,000 2 Current Assets 31,403,445,000 33,272,618,000 32,515,399,000 28,985,443,000 3 Cash, Cash Equivalents & Short Term Investments 13,800,610,000 12,928,189,000 14,490,157,000 13,896,374,000 4 Cash And Cash Equivalents 13,745,118,000 8,809,253,000 13,689,998,000 13,362,236,000 .. ... ... ... ... ... 61 Tangible Book Value 31,461,529,000 27,157,067,000 25,379,979,000 22,667,765,000 62 Total Debt 22,977,157,000 29,726,735,000 24,317,696,000 22,411,623,000 63 Net Debt 9,232,039,000 20,917,482,000 10,627,698,000 9,049,387,000 64 Share Issued 8,780,427 8,780,427 8,780,427 8,780,427 65 Ordinary Shares Number 8,780,427 8,780,427 8,780,427 8,780,427 [66 rows x 5 columns] ------------------ (program exited with code: 0) Press any key to continue . . .
thanks for the guidance Big Grin
Reply
#8
Thanks for your nice solution.
Do you know the way to get it with expanded and quartely view at the same time?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Web Scraping with Yahoo Finance miloellison 1 2,052 Jul-03-2020, 11:12 PM
Last Post: Larz60+
  Django finance tracker mkb3112 1 1,931 Apr-04-2020, 01:21 PM
Last Post: leeacto
  Searching yahoo with selenium Truman 19 34,650 Oct-13-2018, 11:56 PM
Last Post: snippsat
  Scrap Yahoo Finance using BS4 mr_byte31 7 6,181 Aug-24-2018, 02:50 PM
Last Post: Larz60+
  webscraping yahoo data - custom date implementation Jens89 4 5,117 Jun-19-2018, 08:02 AM
Last Post: Jens89
  Cant get financial data from google Adam 0 3,057 Apr-11-2018, 03:02 PM
Last Post: Adam

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020