Problem with calculation of market data

MrQuant1 · (This post was last modified: Jan-12-2024, 10:01 PM by Larz60+.)

Hello dear forum, my first post here.

I'm currently stuck and have been working on the same problem for days.

I am currently writing a code that calculates certain parameters based on market data. I am currently trying to calculate the trading range of the Asian session but I am encountering the following problem...
As soon as I filter the Asian time span and the first value is not 00:00:00, the data for the Asia HOTD/LOTD M15 is no longer displayed and calculated (NaN), but the time data derived from it is, strangely enough, correct (Time Asia LOTD/HOTD M15). I need the values so that I can then calculate the difference and thus obtain the trading margin.

More details on my data source:

- Time zone: EET
- Columns: Date, Time (15 minute intervals, representing 15 minute trading candles), Open, High, Low, Close, Volume (important: the high/low of the day can occur in Low or High the highest value is not always in High and the lowest not in Low)

Here is a sample of the data source of one day (15m data)

Output:22/10/2002;00:00:00;69.28;69.28;69.25;69.25;0
22/10/2002;00:15:00;69.245;69.25;69.2;69.22;0
22/10/2002;00:30:00;69.235;69.29;69.215;69.28;0
22/10/2002;00:45:00;69.28;69.28;69.26;69.275;0
22/10/2002;01:00:00;69.29;69.3;69.275;69.29;0
22/10/2002;01:15:00;69.28;69.285;69.26;69.285;0
22/10/2002;01:30:00;69.29;69.32;69.26;69.275;0
22/10/2002;01:45:00;69.265;69.285;69.245;69.255;0
22/10/2002;02:00:00;69.255;69.26;69.21;69.23;0
22/10/2002;02:15:00;69.225;69.275;69.225;69.255;0
22/10/2002;02:30:00;69.26;69.29;69.235;69.235;0
22/10/2002;02:45:00;69.245;69.435;69.24;69.42;0
22/10/2002;03:00:00;69.43;69.495;69.42;69.435;0
22/10/2002;03:15:00;69.425;69.485;69.415;69.46;0
22/10/2002;03:30:00;69.465;69.495;69.375;69.46;0
22/10/2002;03:45:00;69.45;69.51;69.445;69.5;0
22/10/2002;04:00:00;69.51;69.62;69.51;69.585;0
22/10/2002;04:15:00;69.59;69.59;69.545;69.57;0
22/10/2002;04:30:00;69.575;69.605;69.56;69.595;0
22/10/2002;04:45:00;69.6;69.66;69.585;69.655;0
22/10/2002;05:00:00;69.66;69.67;69.6;69.61;0
22/10/2002;05:15:00;69.615;69.71;69.615;69.695;0
22/10/2002;05:30:00;69.69;69.69;69.63;69.675;0
22/10/2002;05:45:00;69.67;69.69;69.64;69.685;0
22/10/2002;06:00:00;69.68;69.68;69.585;69.59;0
22/10/2002;06:15:00;69.58;69.6;69.565;69.575;0
22/10/2002;06:30:00;69.585;69.585;69.515;69.53;0
22/10/2002;06:45:00;69.525;69.54;69.475;69.515;0
22/10/2002;07:00:00;69.505;69.51;69.485;69.505;0
22/10/2002;07:15:00;69.5;69.5;69.375;69.395;0
22/10/2002;07:30:00;69.405;69.465;69.385;69.42;0
22/10/2002;07:45:00;69.425;69.43;69.295;69.365;0
etc...

        
              import pandas as pd
 
df_hourly = pd.read_csv(r"C:\Users\XXXX\OneDrive\Desktop\BACKTEST\audjpy\audjpy-15m-eet.csv", sep=';', header=None)
df_hourly = df_hourly.rename(columns={0: 'Date', 1: 'Time', 2: 'Open_hourly', 3: 'Low_hourly', 4: 'High_hourly', 5: 'Close_hourly'})
 
required_times = pd.date_range('00:00:00', '23:45:00', freq='15T').strftime('%H:%M:%S').tolist()
 
complete_hourly_dates = df_hourly['Date'][
    df_hourly.groupby('Date')['Time'].transform(
        lambda x: set(x).issuperset(required_times)
    )
]
 
df_daily = pd.read_csv(r"C:\Users\XXXX\OneDrive\Desktop\BACKTEST\audjpy\audjpy-1d.csv", sep=';', header=None)
df_daily = df_daily.rename(columns={0: 'Date', 1: 'Time', 2: 'Open_daily', 3: 'Low_daily', 4: 'High_daily', 5: 'Close_daily'})
 
df_hourly = df_hourly[df_hourly['Date'].isin(complete_hourly_dates)]
 
merged_df = pd.merge(df_hourly, df_daily[['Date', 'Open_daily', 'Close_daily']], on='Date', how='left', suffixes=('_hourly', '_daily'))
 
reference_df = df_daily[['Date', 'Low_daily', 'High_daily']]
 
merged_df = pd.merge(merged_df, reference_df, on='Date', how='left', suffixes=('_hourly', '_daily'))
 
merged_df['HOTD (D1)'] = merged_df.groupby('Date')['Low_daily'].transform('min')
merged_df['LOTD (D1)'] = merged_df.groupby('Date')['High_daily'].transform('max')
 
merged_df['LOTD (M15)'] = merged_df.groupby('Date')[['Low_hourly', 'High_hourly']].transform('min').min(axis=1)
merged_df['HOTD (M15)'] = merged_df.groupby('Date')[['Low_hourly', 'High_hourly']].transform('max').max(axis=1)
 
index_hotd_m15 = merged_df.groupby('Date')[['Low_hourly', 'High_hourly']].idxmax().values.flatten()
index_lotd_m15 = merged_df.groupby('Date')[['High_hourly', 'Low_hourly']].idxmin().values.flatten()
 
time_hotd_m15 = merged_df.loc[index_hotd_m15, ['Date', 'Time']].values
time_lotd_m15 = merged_df.loc[index_lotd_m15, ['Date', 'Time']].values
 
time_hotd_m15_df = pd.DataFrame(time_hotd_m15, columns=['Date', 'Time_HOTD_M15'])
time_lotd_m15_df = pd.DataFrame(time_lotd_m15, columns=['Date', 'Time_LOTD_M15'])
 
merged_df = pd.merge(merged_df, time_hotd_m15_df, on='Date', how='left')
merged_df = pd.merge(merged_df, time_lotd_m15_df, on='Date', how='left')
 
merged_df['M15_HOTD_LOTD_Diff'] = (merged_df['HOTD (M15)'] - merged_df['LOTD (M15)']) / 0.01
 
# ************************* THIS IS WHERE THE PROBLEM ARISES, 03:00:00 to 07:00:00 SHOULD BE, BUT AS SOON AS ANYTHING OTHER THAN 00:00:00 IS IN THE START TIME, ASIA HOTD/LOTD M15 WILL NO LONGER BE DISPLAYED TO ME ! NaN
 
asia_df = merged_df[(merged_df['Time'] >= '00:00:00') & (merged_df['Time'] <= '07:00:00')]
 
# ************************* THIS IS WHERE THE PROBLEM ARISES, 03:00:00 to 07:00:00 SHOULD BE, BUT AS SOON AS ANYTHING OTHER THAN 00:00:00 IS IN THE START TIME, ASIA HOTD/LOTD M15 WILL NO LONGER BE DISPLAYED TO ME ! NaN
 
asia_lotd_m15 = asia_df.groupby('Date')[['Low_hourly', 'High_hourly']].transform('min').min(axis=1)
asia_hotd_m15 = asia_df.groupby('Date')[['High_hourly', 'Low_hourly']].transform('max').max(axis=1)
 
merged_df['Asia LOTD M15'] = asia_lotd_m15
merged_df['Asia HOTD M15'] = asia_hotd_m15
 
merged_df['Asia HOTD LOTD Diff'] = (merged_df['Asia HOTD M15'] - merged_df['Asia LOTD M15']) / 0.01
 
index_asia_hotd_m15 = asia_df.groupby('Date')[['Low_hourly', 'High_hourly']].idxmax().values.flatten()
index_asia_lotd_m15 = asia_df.groupby('Date')[['High_hourly', 'Low_hourly']].idxmin().values.flatten()
 
time_asia_hotd_m15 = merged_df.loc[index_asia_hotd_m15, ['Date', 'Time']].values
time_asia_lotd_m15 = merged_df.loc[index_asia_lotd_m15, ['Date', 'Time']].values
 
time_asia_hotd_m15_df = pd.DataFrame(time_asia_hotd_m15, columns=['Date', 'Time_Asia_HOTD_M15'])
time_asia_lotd_m15_df = pd.DataFrame(time_asia_lotd_m15, columns=['Date', 'Time_Asia_LOTD_M15'])
 
merged_df = pd.merge(merged_df, time_asia_hotd_m15_df, on='Date', how='left')
merged_df = pd.merge(merged_df, time_asia_lotd_m15_df, on='Date', how='left')
 
df_news = pd.read_csv(r"C:\Users\xxxx\OneDrive\Desktop\BACKTEST\News\neue_news_eet_formatted2.csv", sep=';', header=None)
df_news = df_news.rename(columns={0: 'Date', 1: 'Time', 2: 'Currency', 3: 'Significance', 4: 'NewsName'})
 
filtered_news = df_news[(df_news['Currency'].isin(['JPY', 'AUD'])) & (df_news['Significance'].isin(['N', 'M', 'H']))]
 
grouped_news = filtered_news.groupby('Date', as_index=False)
 
merged_df = pd.merge(merged_df, grouped_news.agg({'NewsName': lambda x: ';'.join(x),
                                                   'Significance': lambda x: ';'.join(x)}),
                     on='Date', how='left')
 
merged_df['NewsName'].replace('', pd.NA, inplace=True)
merged_df['Significance'].replace('', pd.NA, inplace=True)
 
merged_df['Weekday'] = pd.to_datetime(merged_df['Date'], format='%d/%m/%Y').dt.strftime('%A')
 
merged_df = merged_df.drop_duplicates(subset=['Date'])
 
merged_df['Entry Count'] = range(1, len(merged_df) + 1)
merged_df.set_index('Entry Count', inplace=True)
 
merged_df.loc[merged_df['Time_HOTD_M15'] < merged_df['Time_LOTD_M15'], 'Signal'] = 'Bearish'
merged_df.loc[merged_df['Time_HOTD_M15'] > merged_df['Time_LOTD_M15'], 'Signal'] = 'Bullish'
merged_df.loc[merged_df['Time_HOTD_M15'] == merged_df['Time_LOTD_M15'], 'Signal'] = 'Neutral'
 
pd.set_option('display.max_rows', 100)
 
print(merged_df[['Date', 'Weekday', 'Signal', 'HOTD (D1)', 'LOTD (D1)', 'HOTD (M15)', 'LOTD (M15)', 'M15_HOTD_LOTD_Diff', 'Time_HOTD_M15', 'Time_LOTD_M15', 'Asia HOTD M15', 'Asia LOTD M15', 'Asia HOTD LOTD Diff', 'Time_Asia_HOTD_M15', 'Time_Asia_LOTD_M15', 'NewsName', 'Significance']])

I also noticed something else that I just can't figure out:

Depending on whether I swap low_hourly and high_hourly positions, different values come out at the end. As it is now, everything comes out correctly, but the code should actually universally calculate the time for the highest value that could be found from Low_hourly/High_hourly and the time from the lowest value that could be found from Low_hourly/High_hourly, regardless of the order?

        
              index_hotd_m15 = merged_df.groupby('Date')[['Low_hourly', 'High_hourly']].idxmax().values.flatten()
index_lotd_m15 = merged_df.groupby('Date')[[['High_hourly', 'Low_hourly']].idxmin().values.flatten()

I would be infinitely grateful if someone could help me further.

Thanks in advance and sorry for the long text!

Larz60+ write Jan-12-2024, 10:01 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
This has been added for you this time. Please use BBCode tags on future posts.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Stock market data in LibreOffice Calc	Pitone	4	6,920	Nov-20-2024, 04:47 PM Last Post: Rarymdom
	Market Basket Analysis: Finding Association Rules	kylenater	1	2,798	Jun-21-2019, 03:53 AM Last Post: scidam
	Problem with saving data and loading data to mysql	kirito85	4	5,306	Feb-08-2019, 10:53 AM Last Post: kirito85

Problem with calculation of market data

User Panel Messages

Announcements