Python Forum
Problem with calculation of market data
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Problem with calculation of market data
#1
Hello dear forum, my first post here.

I'm currently stuck and have been working on the same problem for days.

I am currently writing a code that calculates certain parameters based on market data. I am currently trying to calculate the trading range of the Asian session but I am encountering the following problem...
As soon as I filter the Asian time span and the first value is not 00:00:00, the data for the Asia HOTD/LOTD M15 is no longer displayed and calculated (NaN), but the time data derived from it is, strangely enough, correct (Time Asia LOTD/HOTD M15). I need the values so that I can then calculate the difference and thus obtain the trading margin.

More details on my data source:

- Time zone: EET
- Columns: Date, Time (15 minute intervals, representing 15 minute trading candles), Open, High, Low, Close, Volume (important: the high/low of the day can occur in Low or High the highest value is not always in High and the lowest not in Low)

Here is a sample of the data source of one day (15m data)
Output:
22/10/2002;00:00:00;69.28;69.28;69.25;69.25;0 22/10/2002;00:15:00;69.245;69.25;69.2;69.22;0 22/10/2002;00:30:00;69.235;69.29;69.215;69.28;0 22/10/2002;00:45:00;69.28;69.28;69.26;69.275;0 22/10/2002;01:00:00;69.29;69.3;69.275;69.29;0 22/10/2002;01:15:00;69.28;69.285;69.26;69.285;0 22/10/2002;01:30:00;69.29;69.32;69.26;69.275;0 22/10/2002;01:45:00;69.265;69.285;69.245;69.255;0 22/10/2002;02:00:00;69.255;69.26;69.21;69.23;0 22/10/2002;02:15:00;69.225;69.275;69.225;69.255;0 22/10/2002;02:30:00;69.26;69.29;69.235;69.235;0 22/10/2002;02:45:00;69.245;69.435;69.24;69.42;0 22/10/2002;03:00:00;69.43;69.495;69.42;69.435;0 22/10/2002;03:15:00;69.425;69.485;69.415;69.46;0 22/10/2002;03:30:00;69.465;69.495;69.375;69.46;0 22/10/2002;03:45:00;69.45;69.51;69.445;69.5;0 22/10/2002;04:00:00;69.51;69.62;69.51;69.585;0 22/10/2002;04:15:00;69.59;69.59;69.545;69.57;0 22/10/2002;04:30:00;69.575;69.605;69.56;69.595;0 22/10/2002;04:45:00;69.6;69.66;69.585;69.655;0 22/10/2002;05:00:00;69.66;69.67;69.6;69.61;0 22/10/2002;05:15:00;69.615;69.71;69.615;69.695;0 22/10/2002;05:30:00;69.69;69.69;69.63;69.675;0 22/10/2002;05:45:00;69.67;69.69;69.64;69.685;0 22/10/2002;06:00:00;69.68;69.68;69.585;69.59;0 22/10/2002;06:15:00;69.58;69.6;69.565;69.575;0 22/10/2002;06:30:00;69.585;69.585;69.515;69.53;0 22/10/2002;06:45:00;69.525;69.54;69.475;69.515;0 22/10/2002;07:00:00;69.505;69.51;69.485;69.505;0 22/10/2002;07:15:00;69.5;69.5;69.375;69.395;0 22/10/2002;07:30:00;69.405;69.465;69.385;69.42;0 22/10/2002;07:45:00;69.425;69.43;69.295;69.365;0 etc...
import pandas as pd

df_hourly = pd.read_csv(r"C:\Users\XXXX\OneDrive\Desktop\BACKTEST\audjpy\audjpy-15m-eet.csv", sep=';', header=None)
df_hourly = df_hourly.rename(columns={0: 'Date', 1: 'Time', 2: 'Open_hourly', 3: 'Low_hourly', 4: 'High_hourly', 5: 'Close_hourly'})

required_times = pd.date_range('00:00:00', '23:45:00', freq='15T').strftime('%H:%M:%S').tolist()

complete_hourly_dates = df_hourly['Date'][
    df_hourly.groupby('Date')['Time'].transform(
        lambda x: set(x).issuperset(required_times)
    )
]

df_daily = pd.read_csv(r"C:\Users\XXXX\OneDrive\Desktop\BACKTEST\audjpy\audjpy-1d.csv", sep=';', header=None)
df_daily = df_daily.rename(columns={0: 'Date', 1: 'Time', 2: 'Open_daily', 3: 'Low_daily', 4: 'High_daily', 5: 'Close_daily'})

df_hourly = df_hourly[df_hourly['Date'].isin(complete_hourly_dates)]

merged_df = pd.merge(df_hourly, df_daily[['Date', 'Open_daily', 'Close_daily']], on='Date', how='left', suffixes=('_hourly', '_daily'))

reference_df = df_daily[['Date', 'Low_daily', 'High_daily']]

merged_df = pd.merge(merged_df, reference_df, on='Date', how='left', suffixes=('_hourly', '_daily'))

merged_df['HOTD (D1)'] = merged_df.groupby('Date')['Low_daily'].transform('min')
merged_df['LOTD (D1)'] = merged_df.groupby('Date')['High_daily'].transform('max')

merged_df['LOTD (M15)'] = merged_df.groupby('Date')[['Low_hourly', 'High_hourly']].transform('min').min(axis=1)
merged_df['HOTD (M15)'] = merged_df.groupby('Date')[['Low_hourly', 'High_hourly']].transform('max').max(axis=1)

index_hotd_m15 = merged_df.groupby('Date')[['Low_hourly', 'High_hourly']].idxmax().values.flatten()
index_lotd_m15 = merged_df.groupby('Date')[['High_hourly', 'Low_hourly']].idxmin().values.flatten()

time_hotd_m15 = merged_df.loc[index_hotd_m15, ['Date', 'Time']].values
time_lotd_m15 = merged_df.loc[index_lotd_m15, ['Date', 'Time']].values

time_hotd_m15_df = pd.DataFrame(time_hotd_m15, columns=['Date', 'Time_HOTD_M15'])
time_lotd_m15_df = pd.DataFrame(time_lotd_m15, columns=['Date', 'Time_LOTD_M15'])

merged_df = pd.merge(merged_df, time_hotd_m15_df, on='Date', how='left')
merged_df = pd.merge(merged_df, time_lotd_m15_df, on='Date', how='left')

merged_df['M15_HOTD_LOTD_Diff'] = (merged_df['HOTD (M15)'] - merged_df['LOTD (M15)']) / 0.01

# ************************* THIS IS WHERE THE PROBLEM ARISES, 03:00:00 to 07:00:00 SHOULD BE, BUT AS SOON AS ANYTHING OTHER THAN 00:00:00 IS IN THE START TIME, ASIA HOTD/LOTD M15 WILL NO LONGER BE DISPLAYED TO ME ! NaN

asia_df = merged_df[(merged_df['Time'] >= '00:00:00') & (merged_df['Time'] <= '07:00:00')]

# ************************* THIS IS WHERE THE PROBLEM ARISES, 03:00:00 to 07:00:00 SHOULD BE, BUT AS SOON AS ANYTHING OTHER THAN 00:00:00 IS IN THE START TIME, ASIA HOTD/LOTD M15 WILL NO LONGER BE DISPLAYED TO ME ! NaN

asia_lotd_m15 = asia_df.groupby('Date')[['Low_hourly', 'High_hourly']].transform('min').min(axis=1)
asia_hotd_m15 = asia_df.groupby('Date')[['High_hourly', 'Low_hourly']].transform('max').max(axis=1)

merged_df['Asia LOTD M15'] = asia_lotd_m15
merged_df['Asia HOTD M15'] = asia_hotd_m15

merged_df['Asia HOTD LOTD Diff'] = (merged_df['Asia HOTD M15'] - merged_df['Asia LOTD M15']) / 0.01

index_asia_hotd_m15 = asia_df.groupby('Date')[['Low_hourly', 'High_hourly']].idxmax().values.flatten()
index_asia_lotd_m15 = asia_df.groupby('Date')[['High_hourly', 'Low_hourly']].idxmin().values.flatten()

time_asia_hotd_m15 = merged_df.loc[index_asia_hotd_m15, ['Date', 'Time']].values
time_asia_lotd_m15 = merged_df.loc[index_asia_lotd_m15, ['Date', 'Time']].values

time_asia_hotd_m15_df = pd.DataFrame(time_asia_hotd_m15, columns=['Date', 'Time_Asia_HOTD_M15'])
time_asia_lotd_m15_df = pd.DataFrame(time_asia_lotd_m15, columns=['Date', 'Time_Asia_LOTD_M15'])

merged_df = pd.merge(merged_df, time_asia_hotd_m15_df, on='Date', how='left')
merged_df = pd.merge(merged_df, time_asia_lotd_m15_df, on='Date', how='left')

df_news = pd.read_csv(r"C:\Users\xxxx\OneDrive\Desktop\BACKTEST\News\neue_news_eet_formatted2.csv", sep=';', header=None)
df_news = df_news.rename(columns={0: 'Date', 1: 'Time', 2: 'Currency', 3: 'Significance', 4: 'NewsName'})

filtered_news = df_news[(df_news['Currency'].isin(['JPY', 'AUD'])) & (df_news['Significance'].isin(['N', 'M', 'H']))]

grouped_news = filtered_news.groupby('Date', as_index=False)

merged_df = pd.merge(merged_df, grouped_news.agg({'NewsName': lambda x: ';'.join(x),
                                                   'Significance': lambda x: ';'.join(x)}),
                     on='Date', how='left')

merged_df['NewsName'].replace('', pd.NA, inplace=True)
merged_df['Significance'].replace('', pd.NA, inplace=True)

merged_df['Weekday'] = pd.to_datetime(merged_df['Date'], format='%d/%m/%Y').dt.strftime('%A')

merged_df = merged_df.drop_duplicates(subset=['Date'])

merged_df['Entry Count'] = range(1, len(merged_df) + 1)
merged_df.set_index('Entry Count', inplace=True)

merged_df.loc[merged_df['Time_HOTD_M15'] < merged_df['Time_LOTD_M15'], 'Signal'] = 'Bearish'
merged_df.loc[merged_df['Time_HOTD_M15'] > merged_df['Time_LOTD_M15'], 'Signal'] = 'Bullish'
merged_df.loc[merged_df['Time_HOTD_M15'] == merged_df['Time_LOTD_M15'], 'Signal'] = 'Neutral'

pd.set_option('display.max_rows', 100)

print(merged_df[['Date', 'Weekday', 'Signal', 'HOTD (D1)', 'LOTD (D1)', 'HOTD (M15)', 'LOTD (M15)', 'M15_HOTD_LOTD_Diff', 'Time_HOTD_M15', 'Time_LOTD_M15', 'Asia HOTD M15', 'Asia LOTD M15', 'Asia HOTD LOTD Diff', 'Time_Asia_HOTD_M15', 'Time_Asia_LOTD_M15', 'NewsName', 'Significance']])
I also noticed something else that I just can't figure out:

Depending on whether I swap low_hourly and high_hourly positions, different values come out at the end. As it is now, everything comes out correctly, but the code should actually universally calculate the time for the highest value that could be found from Low_hourly/High_hourly and the time from the lowest value that could be found from Low_hourly/High_hourly, regardless of the order?

index_hotd_m15 = merged_df.groupby('Date')[['Low_hourly', 'High_hourly']].idxmax().values.flatten()
index_lotd_m15 = merged_df.groupby('Date')[[['High_hourly', 'Low_hourly']].idxmin().values.flatten()
I would be infinitely grateful if someone could help me further.

Thanks in advance and sorry for the long text!
Larz60+ write Jan-12-2024, 10:01 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
This has been added for you this time. Please use BBCode tags on future posts.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question Stock market data in LibreOffice Calc Pitone 3 4,335 Jan-14-2021, 04:34 AM
Last Post: DrinkinBeer
  Market Basket Analysis: Finding Association Rules kylenater 1 2,131 Jun-21-2019, 03:53 AM
Last Post: scidam
  Problem with saving data and loading data to mysql kirito85 4 3,933 Feb-08-2019, 10:53 AM
Last Post: kirito85

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020