Is there a more elegant way to concatenate data frames?

Is there a more elegant way to concatenate data frames? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Is there a more elegant way to concatenate data frames? (/thread-40166.html)

Is there a more elegant way to concatenate data frames? - db042190 - Jun-12-2023

Hi, I'm picturing a script that will include in a loop everything after the setting of "today" and before the "print" prototyped in the code below. The loop will read symbols from a notepad file line by line. I would substitute the hardcoded Ticker settings you see below with the symbol just read from notepad. If the record read is the first, the download call where AAPL is now a placeholder would be called. Otherwise the placeholder for the WMT call would be executed.

The download method allows passing multiple tickers in a call but long term I think this will be more flexible. From what I can tell, if you don't pass multiple symbols in a call, you don't get a column for Ticker.

Can I make this more elegant? It seems awkward.

import yfinance as yf
import pandas as pd
from datetime import date

today = date.today()

Ticker="AAPL"
data1 = yf.download(Ticker, start="2023-05-01", end=today).round(2)
data1["Ticker"]=Ticker

Ticker="WMT"
data2 = yf.download(Ticker, start="2023-05-01", end=today).round(2)
data2["Ticker"]=Ticker

data1=[data1,data2]
data1 = pd.concat(data1)

print(data1)

RE: Is there a more elegant way to concatenate data frames? - deanhystad - Jun-13-2023

Looks fine to me.

import yfinance as yf
import pandas as pd
from datetime import date

tickers = ("AAPL", "WMT")  # Or read from file
today = date.today()
start = today - timedelta(days=7)
data = None
for ticker in tickers:
    x = yf.download(ticker, start=start, end=today, progress=False).round(2)
    x.insert(0, "Ticker", ticker)
    if data is None:
        data = x
    else:
        data = pd.concat((data, x))
data = data.sort_index()
print(data)

Output:           Ticker    Open    High     Low   Close  Adj Close    Volume
Date
2023-06-06   AAPL  179.97  180.12  177.43  179.21     179.21  64848400
2023-06-06    WMT  149.70  150.19  148.51  149.78     149.78   5005200
2023-06-07   AAPL  178.44  181.21  177.32  177.82     177.82  61944600
2023-06-07    WMT  149.25  150.36  149.04  150.00     150.00   8085500
2023-06-08   AAPL  177.90  180.84  177.46  180.57     180.57  50214900
2023-06-08    WMT  150.39  152.43  149.79  152.17     152.17   6291000
2023-06-09   AAPL  181.50  182.23  180.63  180.96     180.96  48870700
2023-06-09    WMT  152.16  153.72  151.60  153.09     153.09   5201300
2023-06-12   AAPL  181.27  183.89  180.97  183.79     183.79  54274900
2023-06-12    WMT  153.43  154.30  153.17  154.10     154.10   4904500

I moved the ticker column. I think it makes more sense to place it ahead of the financial information. Also sorted the resulting table by the date index and changed the starting data to a calculation instead of a string. Just for fun.

RE: Is there a more elegant way to concatenate data frames? - db042190 - Jun-13-2023

much more elegant. thank you.

RE: Is there a more elegant way to concatenate data frames? - snippsat - Jun-13-2023

Some tips about dates in Pandas and if look Date so is lower in header column and need a fix.
So here have i remove datatime import an used Pandas own date functionality
Can fine use both,but when first has import Pandas don't need a addition import of datetime.

import yfinance as yf
import pandas as pd

tickers = ("AAPL", "WMT")  # Or read from file
today = pd.to_datetime("today")
start = today - pd.Timedelta(days=7)
data = None
for ticker in tickers:
    x = yf.download(ticker, start=start, end=today, progress=False).round(2)
    x.insert(0, "Ticker", ticker)
    if data is None:
        data = x
    else:
        data = pd.concat((data, x))
    data = data.sort_index()
print(data)

>>> data
           Ticker    Open    High     Low   Close  Adj Close    Volume
Date                                                                  
2023-06-06   AAPL  179.97  180.12  177.43  179.21     179.21  64848400
2023-06-06    WMT  149.70  150.19  148.51  149.78     149.78   5005200
2023-06-07   AAPL  178.44  181.21  177.32  177.82     177.82  61944600
2023-06-07    WMT  149.25  150.36  149.04  150.00     150.00   8085500
2023-06-08   AAPL  177.90  180.84  177.46  180.57     180.57  50214900
2023-06-08    WMT  150.39  152.43  149.79  152.17     152.17   6291000
2023-06-09   AAPL  181.50  182.23  180.63  180.96     180.96  48870700
2023-06-09    WMT  152.16  153.72  151.60  153.09     153.09   5201300
2023-06-12   AAPL  181.27  183.89  180.97  183.79     183.79  54274900
2023-06-12    WMT  153.43  154.30  153.17  154.10     154.10   4904500
2023-06-13   AAPL  182.80  184.15  182.47  183.13     183.13  27582874
2023-06-13    WMT  154.52  155.49  154.07  155.40     155.40   1844848

>>> data.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 12 entries, 2023-06-06 to 2023-06-13
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Ticker     12 non-null     object 
 1   Open       12 non-null     float64
 2   High       12 non-null     float64
 3   Low        12 non-null     float64
 4   Close      12 non-null     float64
 5   Adj Close  12 non-null     float64
 6   Volume     12 non-null     int64  
dtypes: float64(5), int64(1), object(1)
memory usage: 768.0+ bytes

So in info we see no Date info,to fix this.

>>> data = data.reset_index()
>>> data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 8 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       12 non-null     datetime64[ns]
 1   Ticker     12 non-null     object        
 2   Open       12 non-null     float64       
 3   High       12 non-null     float64       
 4   Low        12 non-null     float64       
 5   Close      12 non-null     float64       
 6   Adj Close  12 non-null     float64       
 7   Volume     12 non-null     int64         
dtypes: datetime64[ns](1), float64(5), int64(1), object(1)
memory usage: 900.0+ bytes

So now have a working DataFrame,Date see datetime64[ns]
Then can eg do a plot with Date and low last 90 days,high using eg seaborn

import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

tickers = ("AAPL", "WMT")  # Or read from file
today = pd.to_datetime("today")
start = today - pd.Timedelta(days=90)
data = None
for ticker in tickers:
    x = yf.download(ticker, start=start, end=today, progress=False).round(2)
    x.insert(0, "Ticker", ticker)
    if data is None:
        data = x
    else:
        data = pd.concat((data, x))
    data = data.sort_index()
#print(data)
data = data.reset_index()
# Plot
plt.figure(figsize=(15, 6))
sns.set_style("darkgrid")
sns.lineplot(data=data, x='Date', y='High', label='High')
sns.lineplot(data=data, x='Date', y='Low', label='Low')
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('High and Low Stock Prices')
plt.legend()
plt.show()