May-25-2020, 12:07 AM
(This post was last modified: May-25-2020, 12:07 AM by asiaphone12.)
hello, I have problem when scraping data from yahoo finance. I search the forum but all I find is about stock data, not financial data. I want to get the Income Statement, Balance Sheet and Cash Flow for valuation.
here is the code (credit to Matt Button):
when I try to download the web page, and then parse it, it give complete data.
by the way, it still didn't get the complete data from total non-current asset and below such as land, building, machinery, etc.
how to get any of that?
here is the code (credit to Matt Button):
from datetime import datetime import lxml from lxml import html import requests import numpy as np import pandas as pd symbol = 'INDF.JK' url = 'https://finance.yahoo.com/quote/' + symbol + '/balance-sheet?p=' + symbol # Set up the request headers. headers = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3', 'Accept-Encoding': 'gzip, deflate, br', 'Accept-Language': 'en-US,en;q=0.9', 'Cache-Control': 'max-age=0', 'Pragma': 'no-cache', 'Referrer': 'https://google.com', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36' } # Fetching the page. page = requests.get(url, headers) # Parse the page with LXML. tree = html.fromstring(page.content) table_rows = tree.xpath("//div[contains(@class, 'D(tbr)')]") # Ensure that some table rows are found. assert len(table_rows) > 0 parsed_rows = [] for table_row in table_rows: parsed_row = [] # ~ print(table_row) el = table_row.xpath("./div") none_count = 0 for rs in el: try: (text,) = rs.xpath('.//span/text()[1]') parsed_row.append(text) except ValueError: parsed_row.append(np.NaN) none_count += 1 if (none_count < 4): parsed_rows.append(parsed_row) df = pd.DataFrame(parsed_rows) print(df)it give this output:
Output: 0 1 2 3 4
0 Breakdown 12/31/2019 12/31/2018 12/31/2017 12/31/2016
1 Total Assets 96,198,559,000 96,537,796,000 87,939,488,000 82,174,515,000
2 Total Liabilities Net Minority Interest 41,996,071,000 46,620,996,000 41,182,764,000 38,233,092,000
3 Total Equity Gross Minority Interest 54,202,488,000 49,916,800,000 46,756,724,000 43,941,423,000
4 Total Capitalization 46,732,924,000 41,103,855,000 42,785,937,000 40,862,141,000
5 Common Stock Equity 37,777,948,000 33,614,280,000 31,178,844,000 28,974,286,000
6 Net Tangible Assets 31,461,529,000 27,157,067,000 25,379,979,000 22,667,765,000
7 Working Capital 6,716,583,000 2,068,516,000 10,877,636,000 9,766,002,000
8 Invested Capital 60,755,105,000 63,341,015,000 55,496,540,000 51,385,909,000
9 Tangible Book Value 31,461,529,000 27,157,067,000 25,379,979,000 22,667,765,000
10 Total Debt 22,977,157,000 29,726,735,000 24,317,696,000 22,411,623,000
11 Net Debt 9,232,039,000 20,917,482,000 10,627,698,000 9,049,387,000
12 Share Issued 8,780,427 8,780,427 8,780,427 8,780,427
13 Ordinary Shares Number 8,780,427 8,780,427 8,780,427 8,780,427
------------------
(program exited with code: 0)
Press any key to continue . . .
it did not get the complete data such as cash and cash equivalent, inventory, and so on.when I try to download the web page, and then parse it, it give complete data.
from datetime import datetime import lxml from lxml import html import requests import numpy as np import pandas as pd import os os.chdir(r'D:\ahmad\python\web') with open('INDF.JK 6,425.00 -325.00 -4.81% Indofood Sukses Makmur Tbk. - Yahoo Finance.html') as a: page = a.read() tree = html.fromstring(page) table_rows = tree.xpath("//div[contains(@class, 'D(tbr)')]") assert len(table_rows) > 0 parsed_rows = [] for table_row in table_rows: parsed_row = [] # ~ print(table_row) el = table_row.xpath("./div") none_count = 0 for rs in el: try: (text,) = rs.xpath('.//span/text()[1]') parsed_row.append(text) except ValueError: parsed_row.append(np.NaN) none_count += 1 if (none_count < 4): parsed_rows.append(parsed_row) df = pd.DataFrame(parsed_rows) print(df)the output:
Output: 0 1 2 3 4
0 Breakdown 12/30/2019 12/30/2018 12/30/2017 12/30/2016
1 Total Assets 96,198,559,000 96,537,796,000 87,939,488,000 82,174,515,000
2 Current Assets 31,403,445,000 33,272,618,000 32,515,399,000 28,985,443,000
3 Cash, Cash Equivalents & Short Term Investments 13,800,610,000 12,928,189,000 14,490,157,000 13,896,374,000
4 Cash And Cash Equivalents 13,745,118,000 8,809,253,000 13,689,998,000 13,362,236,000
5 Cash 4,714,869,000 4,489,205,000 3,564,920,000 4,251,630,000
6 Cash Equivalents 9,030,249,000 4,320,048,000 10,125,078,000 9,110,606,000
7 Other Short Term Investments 55,492,000 4,118,936,000 800,159,000 534,138,000
8 Inventory 9,658,705,000 11,644,156,000 9,690,981,000 8,469,821,000
9 Prepaid Assets 1,262,100,000 1,610,941,000 1,275,500,000 1,233,831,000
10 Assets Held for Sale Current NaN NaN NaN 0
11 Other Current Assets 717,620,000 516,656,000 205,876,000 180,900,000
12 Total non-current assets 64,795,114,000 63,265,178,000 55,424,089,000 53,189,072,000
13 Total Liabilities Net Minority Interest 41,996,071,000 46,620,996,000 41,182,764,000 38,233,092,000
14 Total Equity Gross Minority Interest 54,202,488,000 49,916,800,000 46,756,724,000 43,941,423,000
15 Total Capitalization 46,732,924,000 41,103,855,000 42,785,937,000 40,862,141,000
16 Common Stock Equity 37,777,948,000 33,614,280,000 31,178,844,000 28,974,286,000
17 Net Tangible Assets 31,461,529,000 27,157,067,000 25,379,979,000 22,667,765,000
18 Working Capital 6,716,583,000 2,068,516,000 10,877,636,000 9,766,002,000
19 Invested Capital 60,755,105,000 63,341,015,000 55,496,540,000 51,385,909,000
20 Tangible Book Value 31,461,529,000 27,157,067,000 25,379,979,000 22,667,765,000
21 Total Debt 22,977,157,000 29,726,735,000 24,317,696,000 22,411,623,000
22 Net Debt 9,232,039,000 20,917,482,000 10,627,698,000 9,049,387,000
23 Share Issued 8,780,427 8,780,427 8,780,427 8,780,427
24 Ordinary Shares Number 8,780,427 8,780,427 8,780,427 8,780,427
------------------
(program exited with code: 0)
Press any key to continue . . .
how to get these complete data without downloading the complete page?by the way, it still didn't get the complete data from total non-current asset and below such as land, building, machinery, etc.
how to get any of that?