Python Forum
getting financial data from yahoo finance
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
getting financial data from yahoo finance
#1
hello, I have problem when scraping data from yahoo finance. I search the forum but all I find is about stock data, not financial data. I want to get the Income Statement, Balance Sheet and Cash Flow for valuation.

here is the code (credit to Matt Button):
from datetime import datetime
import lxml
from lxml import html
import requests
import numpy as np
import pandas as pd

symbol = 'INDF.JK'

url = 'https://finance.yahoo.com/quote/' + symbol + '/balance-sheet?p=' + symbol

# Set up the request headers.
headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'Cache-Control': 'max-age=0',
    'Pragma': 'no-cache',
    'Referrer': 'https://google.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'
}

# Fetching the page.
page = requests.get(url, headers)

# Parse the page with LXML.
tree = html.fromstring(page.content)


table_rows = tree.xpath("//div[contains(@class, 'D(tbr)')]")


# Ensure that some table rows are found.
assert len(table_rows) > 0

parsed_rows = []

for table_row in table_rows:
    parsed_row = []
    # ~ print(table_row)
    el = table_row.xpath("./div")
    none_count = 0
    
    for rs in el:
        try:
            (text,) = rs.xpath('.//span/text()[1]')
            parsed_row.append(text)
        except ValueError:
            parsed_row.append(np.NaN)
            none_count += 1

    if (none_count < 4):
        parsed_rows.append(parsed_row)

df = pd.DataFrame(parsed_rows)
print(df)
it give this output:
Output:
0 1 2 3 4 0 Breakdown 12/31/2019 12/31/2018 12/31/2017 12/31/2016 1 Total Assets 96,198,559,000 96,537,796,000 87,939,488,000 82,174,515,000 2 Total Liabilities Net Minority Interest 41,996,071,000 46,620,996,000 41,182,764,000 38,233,092,000 3 Total Equity Gross Minority Interest 54,202,488,000 49,916,800,000 46,756,724,000 43,941,423,000 4 Total Capitalization 46,732,924,000 41,103,855,000 42,785,937,000 40,862,141,000 5 Common Stock Equity 37,777,948,000 33,614,280,000 31,178,844,000 28,974,286,000 6 Net Tangible Assets 31,461,529,000 27,157,067,000 25,379,979,000 22,667,765,000 7 Working Capital 6,716,583,000 2,068,516,000 10,877,636,000 9,766,002,000 8 Invested Capital 60,755,105,000 63,341,015,000 55,496,540,000 51,385,909,000 9 Tangible Book Value 31,461,529,000 27,157,067,000 25,379,979,000 22,667,765,000 10 Total Debt 22,977,157,000 29,726,735,000 24,317,696,000 22,411,623,000 11 Net Debt 9,232,039,000 20,917,482,000 10,627,698,000 9,049,387,000 12 Share Issued 8,780,427 8,780,427 8,780,427 8,780,427 13 Ordinary Shares Number 8,780,427 8,780,427 8,780,427 8,780,427 ------------------ (program exited with code: 0) Press any key to continue . . .
it did not get the complete data such as cash and cash equivalent, inventory, and so on.
when I try to download the web page, and then parse it, it give complete data.

from datetime import datetime
import lxml
from lxml import html
import requests
import numpy as np
import pandas as pd
import os

os.chdir(r'D:\ahmad\python\web')
with open('INDF.JK 6,425.00 -325.00 -4.81% Indofood Sukses Makmur Tbk. - Yahoo Finance.html') as a:
	page = a.read()

tree = html.fromstring(page)

table_rows = tree.xpath("//div[contains(@class, 'D(tbr)')]")

assert len(table_rows) > 0

parsed_rows = []

for table_row in table_rows:
    parsed_row = []
    # ~ print(table_row)
    el = table_row.xpath("./div")
    none_count = 0
    
    for rs in el:
        try:
            (text,) = rs.xpath('.//span/text()[1]')
            parsed_row.append(text)
        except ValueError:
            parsed_row.append(np.NaN)
            none_count += 1

    if (none_count < 4):
        parsed_rows.append(parsed_row)

df = pd.DataFrame(parsed_rows)
print(df)
the output:
Output:
0 1 2 3 4 0 Breakdown 12/30/2019 12/30/2018 12/30/2017 12/30/2016 1 Total Assets 96,198,559,000 96,537,796,000 87,939,488,000 82,174,515,000 2 Current Assets 31,403,445,000 33,272,618,000 32,515,399,000 28,985,443,000 3 Cash, Cash Equivalents & Short Term Investments 13,800,610,000 12,928,189,000 14,490,157,000 13,896,374,000 4 Cash And Cash Equivalents 13,745,118,000 8,809,253,000 13,689,998,000 13,362,236,000 5 Cash 4,714,869,000 4,489,205,000 3,564,920,000 4,251,630,000 6 Cash Equivalents 9,030,249,000 4,320,048,000 10,125,078,000 9,110,606,000 7 Other Short Term Investments 55,492,000 4,118,936,000 800,159,000 534,138,000 8 Inventory 9,658,705,000 11,644,156,000 9,690,981,000 8,469,821,000 9 Prepaid Assets 1,262,100,000 1,610,941,000 1,275,500,000 1,233,831,000 10 Assets Held for Sale Current NaN NaN NaN 0 11 Other Current Assets 717,620,000 516,656,000 205,876,000 180,900,000 12 Total non-current assets 64,795,114,000 63,265,178,000 55,424,089,000 53,189,072,000 13 Total Liabilities Net Minority Interest 41,996,071,000 46,620,996,000 41,182,764,000 38,233,092,000 14 Total Equity Gross Minority Interest 54,202,488,000 49,916,800,000 46,756,724,000 43,941,423,000 15 Total Capitalization 46,732,924,000 41,103,855,000 42,785,937,000 40,862,141,000 16 Common Stock Equity 37,777,948,000 33,614,280,000 31,178,844,000 28,974,286,000 17 Net Tangible Assets 31,461,529,000 27,157,067,000 25,379,979,000 22,667,765,000 18 Working Capital 6,716,583,000 2,068,516,000 10,877,636,000 9,766,002,000 19 Invested Capital 60,755,105,000 63,341,015,000 55,496,540,000 51,385,909,000 20 Tangible Book Value 31,461,529,000 27,157,067,000 25,379,979,000 22,667,765,000 21 Total Debt 22,977,157,000 29,726,735,000 24,317,696,000 22,411,623,000 22 Net Debt 9,232,039,000 20,917,482,000 10,627,698,000 9,049,387,000 23 Share Issued 8,780,427 8,780,427 8,780,427 8,780,427 24 Ordinary Shares Number 8,780,427 8,780,427 8,780,427 8,780,427 ------------------ (program exited with code: 0) Press any key to continue . . .
how to get these complete data without downloading the complete page?
by the way, it still didn't get the complete data from total non-current asset and below such as land, building, machinery, etc.
how to get any of that?
Reply


Messages In This Thread
getting financial data from yahoo finance - by asiaphone12 - May-25-2020, 12:07 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Web Scraping with Yahoo Finance miloellison 1 2,029 Jul-03-2020, 11:12 PM
Last Post: Larz60+
  Django finance tracker mkb3112 1 1,905 Apr-04-2020, 01:21 PM
Last Post: leeacto
  Searching yahoo with selenium Truman 19 32,533 Oct-13-2018, 11:56 PM
Last Post: snippsat
  Scrap Yahoo Finance using BS4 mr_byte31 7 6,123 Aug-24-2018, 02:50 PM
Last Post: Larz60+
  webscraping yahoo data - custom date implementation Jens89 4 5,075 Jun-19-2018, 08:02 AM
Last Post: Jens89
  Cant get financial data from google Adam 0 3,041 Apr-11-2018, 03:02 PM
Last Post: Adam

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020