Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Webscraping with beautifulsoup
#3
(Aug-23-2023, 07:09 AM)snippsat Wrote:
(Aug-23-2023, 12:04 AM)cormanstan Wrote: It just appears that I'm being blocked by the website so no data is being passed along. Is this correct? Any suggestions?
Set user agent then it will work.
import requests
from bs4 import BeautifulSoup

url = 'https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue'
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
print(soup.find('h2').text)
Output:
Tesla Revenue 2010-2023 | TSLA

Thanks for the reply and it appears to have worked. May I ask why adding the user agent was so important? I have another question if you don't mind.

I'm trying to use the make_graph function on several datasets and keep getting the same error.

make_graph(gme_data, gme_revenue, 'GameStop')
Error:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[140], line 1 ----> 1 make_graph(gme_data, gme_revenue, 'GameStop') Cell In[120], line 6, in make_graph(stock_data, revenue_data, stock) 4 revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30'] 5 fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1) ----> 6 fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1) 7 fig.update_xaxes(title_text="Date", row=1, col=1) 8 fig.update_xaxes(title_text="Date", row=2, col=1) File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:6240, in NDFrame.astype(self, dtype, copy, errors) 6233 results = [ 6234 self.iloc[:, i].astype(dtype, copy=copy) 6235 for i in range(len(self.columns)) 6236 ] 6238 else: 6239 # else, only a single dtype is given -> 6240 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) 6241 return self._constructor(new_data).__finalize__(self, method="astype") 6243 # GH 33113: handle empty frame or series File ~\anaconda3\Lib\site-packages\pandas\core\internals\managers.py:448, in BaseBlockManager.astype(self, dtype, copy, errors) 447 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T: --> 448 return self.apply("astype", dtype=dtype, copy=copy, errors=errors) File ~\anaconda3\Lib\site-packages\pandas\core\internals\managers.py:352, in BaseBlockManager.apply(self, f, align_keys, ignore_failures, **kwargs) 350 applied = b.apply(f, **kwargs) 351 else: --> 352 applied = getattr(b, f)(**kwargs) 353 except (TypeError, NotImplementedError): 354 if not ignore_failures: File ~\anaconda3\Lib\site-packages\pandas\core\internals\blocks.py:526, in Block.astype(self, dtype, copy, errors) 508 """ 509 Coerce to the new dtype. 510 (...) 522 Block 523 """ 524 values = self.values --> 526 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) 528 new_values = maybe_coerce_values(new_values) 529 newb = self.make_block(new_values) File ~\anaconda3\Lib\site-packages\pandas\core\dtypes\astype.py:299, in astype_array_safe(values, dtype, copy, errors) 296 return values.copy() 298 try: --> 299 new_values = astype_array(values, dtype, copy=copy) 300 except (ValueError, TypeError): 301 # e.g. astype_nansafe can fail on object-dtype of strings 302 # trying to convert to float 303 if errors == "ignore": File ~\anaconda3\Lib\site-packages\pandas\core\dtypes\astype.py:230, in astype_array(values, dtype, copy) 227 values = values.astype(dtype, copy=copy) 229 else: --> 230 values = astype_nansafe(values, dtype, copy=copy) 232 # in pandas we don't store numpy str dtypes, so convert to object 233 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str): File ~\anaconda3\Lib\site-packages\pandas\core\dtypes\astype.py:170, in astype_nansafe(arr, dtype, copy, skipna) 166 raise ValueError(msg) 168 if copy or is_object_dtype(arr.dtype) or is_object_dtype(dtype): 169 # Explicit copy, or required since NumPy can't view from / to object. --> 170 return arr.astype(dtype, copy=True) 172 return arr.astype(dtype, copy=copy) ValueError: could not convert string to float: '$10,389'
I originally defined the make_graph function as the following:
def make_graph(stock_data, revenue_data, stock):
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
    stock_data_specific = stock_data[stock_data.Date <= '2021--06-14']
    revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']
    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1)
    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1)
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
    fig.update_layout(showlegend=False,
    height=900,
    title=stock,
    xaxis_rangeslider_visible=True)
    fig.show()
And even tried to add a preprocess line of code to the above code but I continued getting the same error.
# Preprocess the revenue data
    revenue_data_specific["Revenue"] = revenue_data_specific["Revenue"].str.replace(r',|\$', "").astype(float)
I had the same error when trying to make_graph tesla data and solved it with. I tried using a similar piece of code for gme_data and gme_revenue but it didn't help.
tesla_revenue["Revenue"] = tesla_revenue['Revenue'].str.replace(',|\$',"")
Any suggestions how I can fix the error and call the make_graph function on gme_data and gme_revenue?
Reply


Messages In This Thread
Webscraping with beautifulsoup - by cormanstan - Aug-23-2023, 12:04 AM
RE: Webscraping with beautifulsoup - by snippsat - Aug-23-2023, 07:09 AM
RE: Webscraping with beautifulsoup - by cormanstan - Aug-24-2023, 12:02 AM
RE: Webscraping with beautifulsoup - by snippsat - Aug-24-2023, 11:57 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Webscraping news articles by using selenium cate16 7 3,342 Aug-28-2023, 09:58 AM
Last Post: snippsat
  Webscraping returning empty table Buuuwq 0 1,457 Dec-09-2022, 10:41 AM
Last Post: Buuuwq
  WebScraping using Selenium library Korgik 0 1,084 Dec-09-2022, 09:51 AM
Last Post: Korgik
  How to get rid of numerical tokens in output (webscraping issue)? jps2020 0 1,991 Oct-26-2020, 05:37 PM
Last Post: jps2020
  Python Webscraping with a Login Website warriordazza 0 2,656 Jun-07-2020, 07:04 AM
Last Post: warriordazza
  Help with basic webscraping Captain_Snuggle 2 4,011 Nov-07-2019, 08:07 PM
Last Post: kozaizsvemira
  Can't Resolve Webscraping AttributeError Hass 1 2,354 Jan-15-2019, 09:36 PM
Last Post: nilamo
  How to exclude certain links while webscraping basis on keywords Prince_Bhatia 0 3,281 Oct-31-2018, 07:00 AM
Last Post: Prince_Bhatia
  Webscraping homework Ghigo1995 1 2,705 Sep-23-2018, 07:36 PM
Last Post: nilamo
  Intro to WebScraping d1rjr03 2 3,501 Aug-15-2018, 12:05 AM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020