Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
help w/ scraping
#1
Hey, using python 2.7 and need to scrape this page https://www.zacks.com/stock/quote/AAPL for the market cap so running this script should return "2,652.87 B" string to me but it doesnt, what am i doing wrong?

from bs4 import BeautifulSoup
import requests

header = {'User-Agent': 'Mozilla/5.0'} #needed to prevent 403 error
page = requests.get(url="https://www.zacks.com/stock/quote/AAPL",headers=header)
soup = BeautifulSoup(page.content, "lxml")
for tr in soup.findAll("table",class_="abut_bottom"):
    for td in tr.find_all("td"):
        if td.text == "Market Cap":
            print td.text, td.find_next_sibling("td").text
Thanks!
Reply
#2
It's much easier to make a request for https://quote-feed.zacks.com/index.php?t=AAPL
and parse the JSON
import requests

response = requests.get(url="https://quote-feed.zacks.com/index.php?t=AAPL")
data = response.json()
print(data)
print(data['AAPL']['source']['sungard']['market_cap'])
Output:
{'AAPL': {'source': {'sungard': {'bidasksize': '400x400', 'dividend_freq': '4', 'prev_close_date': '01/24/2022 12:33:03', 'timestamp': '16:00', 'exchange': 'NASDAQ', 'shares': '', 'volatility': '1.2', 'zacks_recommendation': '', 'pos_size': '100', 'open': '160.02', 'yrlow': '116.21', 'type': 'S', 'yield': '.54', 'market_cap': '2542608189860', 'ask': '155.68', 'dividend': '.22', 'dividend_date': '11/11/2021 00:00:00', 'earnings': '5.62', 'close': '162.41', 'day_low': '154.7', 'last_trade_datetime': '01/24/2022 12:33:03', 'volume': '122848856', 'yrhigh': '182.94', 'day_high': '161.08', 'bid': '155.66', 'name': 'Apple Inc.', 'pe_ratio': '27.93', 'updated': '01/24/2022 12:33:03'}, 'bats': {'ask_size': '204', 'routed': '373038', 'last_trade_datetime': '01/24/2022 12:50:03', 'matched': '6404811', 'bid_size': '300', 'net_pct_change': 'NULL', 'updated': '01/24/2022 12:50:05', 'end_mkt_day_price': 'NULL', 'ask_price': '156.81', 'bid_price': '156.79', 'last': '174.41', 'pre_after_updated': '', 'net_price_change': 'NULL', 'pre_after_price': '', 'net_change': 'NULL'}, 'pre': {'after_percent_net_change': '-.13', 'after_net_change': '12/19/2017 19:58:07'}}, 'exchange': 'Real Time Quote from BATS', 'dividend_yield': '.54', 'ticker': 'AAPL', 'last': '156.8', 'ticker_type': 'S', 'zacks_rank_text': 'Hold', 'volume': '80981192', 'updated': 'Jan 24, 2022 12:50 PM', 'percent_net_change': '-3.454', 'zacks_rank': '3', 'name': 'Apple Inc.', 'net_change': '-5.61', 'market_time': '', 'previous_close': '162.41', 'SUNGARD_BID': '115.3', 'SUNGARD_YRLOW': '70.51', 'SUNGARD_EARNINGS': '.47', 'SUNGARD_VOLATILITY': '13-NOV-2014', 'SUNGARD_PE_RATIO': '6.45', 'SUNGARD_DAY_LOW': '1.64', 'SUNGARD_MARKET_CAP': '112.75', 'FEED_NET_CHANGE': '26-SEP-2014', 'BATS_PRE_AFTER_UPDATED': '02-DEC-2014', 'SUNGARD_YRHIGH': '119.75', 'SUNGARD_DIVIDEND_FREQ': '000', 'SUNGARD_PREV_CLOSE_DATE': '115.48', 'BATS_ASK_PRICE': '2096014', 'SUNGARD_NAME': 'APPLE INC', 'SUNGARD_TIMESTAMP': '11:50', 'SUNGARD_VOLUME': '', 'SUNGARD_BIDASKSIZE': '13x2', 'SUNGARD_YIELD': 'Q', 'SUNGARD_DAY_HIGH': '.93', 'SUNGARD_ZACKS_RECOMMENDATION': 'S', 'SUNGARD_SHARES': '5', 'SUNGARD_DIVIDEND': '864', 'SUNGARD_DIVIDEND_DATE': '840', 'BATS_BID_SIZE': '100938', 'BATS_BID_PRICE': '520', 'BATS_LAST_TRADE_DATETIME': '99.89', 'FEED_VOLUME': '26-SEP-2014', 'BATS_ASK_SIZE': '02-DEC-2014', 'FEED_TICKER': 'AAPL', 'SUNGARD_POS_SIZE': '100', 'SUNGARD_EXCHANGE': 'NASD', 'SUNGARD_TYPE': '17.88', 'SUNGARD_LAST_TRADE_DATETIME': '02-DEC-2014', 'SUNGARD_UPDATED': '674867.19', 'BATS_ROUTED': '02-DEC-2014', 'BATS_UPDATED': '100', 'FEED_LAST': '99.88', 'FEED_PERCENT_NET_CHANGE': '115.55', 'FEED_SOURCE': '.48', 'BATS_PRE_AFTER_PRICE': '.417', 'SUNGARD_OPEN': '113.5', 'SUNGARD_CLOSE': '115.07', 'SUNGARD_ASK': '115.31', 'BATS_MATCHED': '', 'FEED_UPDATED': '34087164', 'pre_after_net_change': '', 'pre_after_percent_net_change': '', 'pe_f1': '27.93', 'ap_short_name': 'Apple', 'ticker_market_status': 'OPEN PRICES', 'market_status': 'Full Trading Day (Market Open)', 'company_short_name': 'Apple', 'previous_close_date': '01/21/2022'}} 2542608189860
And don't use python2.7 it is not supported as of 2 years ago


EDIT: hm, strange the market_cap from JSON is different from one displayed on the site...

EDIT2: The market cap on the site is calculated based on previous close, i.e.
market_cap*previous_close/ask

import requests

response = requests.get(url="https://quote-feed.zacks.com/index.php?t=AAPL") 
data = response.json()
print(data)

sg = data['AAPL']['source']['sungard']
print(f"market cap: {sg['market_cap']}")
print(f"close: {sg['close']}") # or data['previous_close]
print(f"bid: {sg['ask']}")
print(f"market cap from site:: {int(sg['market_cap'])*float(sg['close'])/float(sg['ask'])/1000000000:.2f} B")
snippsat likes this post
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
Hi, I appreciate the alternative solution but I must follow the course to a tee and scrape the numbers the instructor says, or subsequent instructions in the later videos might break. I'm familiar with excel (which is used in the course extensively) but not python so I don't want to deviate too much from the original instructions.

The script worked last year but the page must've changed since then. Do I change table class to td class?

edit: The instructor uses python 2.7 in the video, so I have no choice. :(
Reply
#4
(Jan-25-2022, 01:35 PM)Hikki Wrote: edit: The instructor uses python 2.7 in the video, so I have no choice. :(
It's really not okay at all to follow a course today that use Python 2.7.
It will just case a lot of trouble,many libaires has been dropping support for Python 2.7.
Like BeautifulSoup that you use here.
Doc Wrote:Beautiful Soup's support for Python 2 was discontinued on December 31, 2020: one year after the sunset date for Python 2 itself.
From this point onward, new Beautiful Soup development will exclusively target Python 3.
The final release of Beautiful Soup 4 to support Python 2 was 4.9.3.
To fix code.
from bs4 import BeautifulSoup
import requests

header = {'User-Agent': 'Mozilla/5.0'} #needed to prevent 403 error
page = requests.get(url="https://www.zacks.com/stock/quote/AAPL", headers=header)
soup = BeautifulSoup(page.content, "lxml")
market_cap = soup.select_one('#stock_activity > dl:nth-child(8)')
print(market_cap.text.strip())
print('-' * 30)
for item in soup.find_all('section', id="stock_activity"):
    print(item)
Output:
Market Cap 2,639.96 B ------------------------------ <section id="stock_activity"> <h3>Stock Activity</h3> <dl class="abut_bottom"> <dt class="alpha">Open</dt> <dd>158.98</dd> </dl> <dl class="abut_bottom"> <dt class="alpha">Day Low</dt> <dd>157.54</dd> </dl> <dl class="abut_bottom"> <dt class="alpha">Day High</dt> <dd>159.75</dd> </dl> <dl class="abut_bottom"> <dt class="alpha">52 Wk Low</dt> <dd>116.21</dd> </dl> <dl class="abut_bottom"> <dt>52 Wk High</dt> <dd>182.94</dd> </dl> <dl class="abut_bottom"> <dt class="alpha">Avg. Volume</dt> <dd><span>86,655,312</span></dd> </dl> <dl class="abut_bottom"> <dt class="alpha">Market Cap</dt> <dd><span>2,639.96 B</span></dd> </dl> <dl class="abut_bottom"> <dt class="alpha"><a class="newwin" href="/stock/research/AAPL/earnings-announcements?tab=dividends">Dividend</a></dt> <dd><span>0.88 ( 0.54%)</span></dd> </dl> <dl class="abut_bottom"> <dt class="alpha"><a class="newwin" href="/stock/chart/AAPL/fundamental/beta">Beta</a></dt> <dd><span>1.20</span></dd> </dl> </section>
Reply
#5
That's probably true but my hands are tied until he updates his course to use python 3.

Your code gave me an error "NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type."

It would be preferable if I can make minimal modifications so I dont interrupt the workflow, if the page changed then cant I just change the words for soup to look for to find the item?
Reply
#6
(Jan-25-2022, 04:24 PM)Hikki Wrote: Your code gave me an error "NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type."
The version of BS that you use is old(because not use Python 3),so it do not support my line 7,8(that use CSS selector).
from bs4 import BeautifulSoup
import requests

header = {'User-Agent': 'Mozilla/5.0'} #needed to prevent 403 error
page = requests.get(url="https://www.zacks.com/stock/quote/AAPL", headers=header)
soup = BeautifulSoup(page.content, "lxml")
market_cap = soup.findAll('section', id="stock_activity")
print(market_cap[0].findAll('dd')[6].text)
print('-' * 30)
for item in soup.findAll('section', id="stock_activity"):
    print(item)
Output:
2,639.96 B ------------------------------ <section id="stock_activity"> <h3>Stock Activity</h3> <dl class="abut_bottom"> <dt class="alpha">Open</dt> <dd>158.98</dd> </dl> <dl class="abut_bottom"> <dt class="alpha">Day Low</dt> <dd>157.02</dd> </dl> <dl class="abut_bottom"> <dt class="alpha">Day High</dt> <dd>159.79</dd> </dl> <dl class="abut_bottom"> <dt class="alpha">52 Wk Low</dt> <dd>116.21</dd> </dl> <dl class="abut_bottom"> <dt>52 Wk High</dt> <dd>182.94</dd> </dl> <dl class="abut_bottom"> <dt class="alpha">Avg. Volume</dt> <dd><span>86,655,312</span></dd> </dl> <dl class="abut_bottom"> <dt class="alpha">Market Cap</dt> <dd><span>2,639.96 B</span></dd> </dl> <dl class="abut_bottom"> <dt class="alpha"><a class="newwin" href="/stock/research/AAPL/earnings-announcements?tab=dividends">Dividend</a></dt> <dd><span>0.88 ( 0.54%)</span></dd> </dl> <dl class="abut_bottom"> <dt class="alpha"><a class="newwin" href="/stock/chart/AAPL/fundamental/beta">Beta</a></dt> <dd><span>1.20</span></dd> </dl> </section>
(Jan-25-2022, 04:24 PM)Hikki Wrote: That's probably true but my hands are tied until he updates his course to use python 3.
You should tell him that holding course that use Python 2.7 is not at all okay.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020