Hello dear forum, my first post here.
I'm currently stuck and have been working on the same problem for days.
I am currently writing a code that calculates certain parameters based on market data. I am currently trying to calculate the trading range of the Asian session but I am encountering the following problem...
As soon as I filter the Asian time span and the first value is not 00:00:00, the data for the Asia HOTD/LOTD M15 is no longer displayed and calculated (NaN), but the time data derived from it is, strangely enough, correct (Time Asia LOTD/HOTD M15). I need the values so that I can then calculate the difference and thus obtain the trading margin.
More details on my data source:
- Time zone: EET
- Columns: Date, Time (15 minute intervals, representing 15 minute trading candles), Open, High, Low, Close, Volume (important: the high/low of the day can occur in Low or High the highest value is not always in High and the lowest not in Low)
Here is a sample of the data source of one day (15m data)
I also noticed something else that I just can't figure out:
Depending on whether I swap low_hourly and high_hourly positions, different values come out at the end. As it is now, everything comes out correctly, but the code should actually universally calculate the time for the highest value that could be found from Low_hourly/High_hourly and the time from the lowest value that could be found from Low_hourly/High_hourly, regardless of the order?
I would be infinitely grateful if someone could help me further.
Thanks in advance and sorry for the long text!
I'm currently stuck and have been working on the same problem for days.
I am currently writing a code that calculates certain parameters based on market data. I am currently trying to calculate the trading range of the Asian session but I am encountering the following problem...
As soon as I filter the Asian time span and the first value is not 00:00:00, the data for the Asia HOTD/LOTD M15 is no longer displayed and calculated (NaN), but the time data derived from it is, strangely enough, correct (Time Asia LOTD/HOTD M15). I need the values so that I can then calculate the difference and thus obtain the trading margin.
More details on my data source:
- Time zone: EET
- Columns: Date, Time (15 minute intervals, representing 15 minute trading candles), Open, High, Low, Close, Volume (important: the high/low of the day can occur in Low or High the highest value is not always in High and the lowest not in Low)
Here is a sample of the data source of one day (15m data)
Output:22/10/2002;00:00:00;69.28;69.28;69.25;69.25;0
22/10/2002;00:15:00;69.245;69.25;69.2;69.22;0
22/10/2002;00:30:00;69.235;69.29;69.215;69.28;0
22/10/2002;00:45:00;69.28;69.28;69.26;69.275;0
22/10/2002;01:00:00;69.29;69.3;69.275;69.29;0
22/10/2002;01:15:00;69.28;69.285;69.26;69.285;0
22/10/2002;01:30:00;69.29;69.32;69.26;69.275;0
22/10/2002;01:45:00;69.265;69.285;69.245;69.255;0
22/10/2002;02:00:00;69.255;69.26;69.21;69.23;0
22/10/2002;02:15:00;69.225;69.275;69.225;69.255;0
22/10/2002;02:30:00;69.26;69.29;69.235;69.235;0
22/10/2002;02:45:00;69.245;69.435;69.24;69.42;0
22/10/2002;03:00:00;69.43;69.495;69.42;69.435;0
22/10/2002;03:15:00;69.425;69.485;69.415;69.46;0
22/10/2002;03:30:00;69.465;69.495;69.375;69.46;0
22/10/2002;03:45:00;69.45;69.51;69.445;69.5;0
22/10/2002;04:00:00;69.51;69.62;69.51;69.585;0
22/10/2002;04:15:00;69.59;69.59;69.545;69.57;0
22/10/2002;04:30:00;69.575;69.605;69.56;69.595;0
22/10/2002;04:45:00;69.6;69.66;69.585;69.655;0
22/10/2002;05:00:00;69.66;69.67;69.6;69.61;0
22/10/2002;05:15:00;69.615;69.71;69.615;69.695;0
22/10/2002;05:30:00;69.69;69.69;69.63;69.675;0
22/10/2002;05:45:00;69.67;69.69;69.64;69.685;0
22/10/2002;06:00:00;69.68;69.68;69.585;69.59;0
22/10/2002;06:15:00;69.58;69.6;69.565;69.575;0
22/10/2002;06:30:00;69.585;69.585;69.515;69.53;0
22/10/2002;06:45:00;69.525;69.54;69.475;69.515;0
22/10/2002;07:00:00;69.505;69.51;69.485;69.505;0
22/10/2002;07:15:00;69.5;69.5;69.375;69.395;0
22/10/2002;07:30:00;69.405;69.465;69.385;69.42;0
22/10/2002;07:45:00;69.425;69.43;69.295;69.365;0
etc...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
import pandas as pd df_hourly = pd.read_csv(r "C:\Users\XXXX\OneDrive\Desktop\BACKTEST\audjpy\audjpy-15m-eet.csv" , sep = ';' , header = None ) df_hourly = df_hourly.rename(columns = { 0 : 'Date' , 1 : 'Time' , 2 : 'Open_hourly' , 3 : 'Low_hourly' , 4 : 'High_hourly' , 5 : 'Close_hourly' }) required_times = pd.date_range( '00:00:00' , '23:45:00' , freq = '15T' ).strftime( '%H:%M:%S' ).tolist() complete_hourly_dates = df_hourly[ 'Date' ][ df_hourly.groupby( 'Date' )[ 'Time' ].transform( lambda x: set (x).issuperset(required_times) ) ] df_daily = pd.read_csv(r "C:\Users\XXXX\OneDrive\Desktop\BACKTEST\audjpy\audjpy-1d.csv" , sep = ';' , header = None ) df_daily = df_daily.rename(columns = { 0 : 'Date' , 1 : 'Time' , 2 : 'Open_daily' , 3 : 'Low_daily' , 4 : 'High_daily' , 5 : 'Close_daily' }) df_hourly = df_hourly[df_hourly[ 'Date' ].isin(complete_hourly_dates)] merged_df = pd.merge(df_hourly, df_daily[[ 'Date' , 'Open_daily' , 'Close_daily' ]], on = 'Date' , how = 'left' , suffixes = ( '_hourly' , '_daily' )) reference_df = df_daily[[ 'Date' , 'Low_daily' , 'High_daily' ]] merged_df = pd.merge(merged_df, reference_df, on = 'Date' , how = 'left' , suffixes = ( '_hourly' , '_daily' )) merged_df[ 'HOTD (D1)' ] = merged_df.groupby( 'Date' )[ 'Low_daily' ].transform( 'min' ) merged_df[ 'LOTD (D1)' ] = merged_df.groupby( 'Date' )[ 'High_daily' ].transform( 'max' ) merged_df[ 'LOTD (M15)' ] = merged_df.groupby( 'Date' )[[ 'Low_hourly' , 'High_hourly' ]].transform( 'min' ). min (axis = 1 ) merged_df[ 'HOTD (M15)' ] = merged_df.groupby( 'Date' )[[ 'Low_hourly' , 'High_hourly' ]].transform( 'max' ). max (axis = 1 ) index_hotd_m15 = merged_df.groupby( 'Date' )[[ 'Low_hourly' , 'High_hourly' ]].idxmax().values.flatten() index_lotd_m15 = merged_df.groupby( 'Date' )[[ 'High_hourly' , 'Low_hourly' ]].idxmin().values.flatten() time_hotd_m15 = merged_df.loc[index_hotd_m15, [ 'Date' , 'Time' ]].values time_lotd_m15 = merged_df.loc[index_lotd_m15, [ 'Date' , 'Time' ]].values time_hotd_m15_df = pd.DataFrame(time_hotd_m15, columns = [ 'Date' , 'Time_HOTD_M15' ]) time_lotd_m15_df = pd.DataFrame(time_lotd_m15, columns = [ 'Date' , 'Time_LOTD_M15' ]) merged_df = pd.merge(merged_df, time_hotd_m15_df, on = 'Date' , how = 'left' ) merged_df = pd.merge(merged_df, time_lotd_m15_df, on = 'Date' , how = 'left' ) merged_df[ 'M15_HOTD_LOTD_Diff' ] = (merged_df[ 'HOTD (M15)' ] - merged_df[ 'LOTD (M15)' ]) / 0.01 # ************************* THIS IS WHERE THE PROBLEM ARISES, 03:00:00 to 07:00:00 SHOULD BE, BUT AS SOON AS ANYTHING OTHER THAN 00:00:00 IS IN THE START TIME, ASIA HOTD/LOTD M15 WILL NO LONGER BE DISPLAYED TO ME ! NaN asia_df = merged_df[(merged_df[ 'Time' ] > = '00:00:00' ) & (merged_df[ 'Time' ] < = '07:00:00' )] # ************************* THIS IS WHERE THE PROBLEM ARISES, 03:00:00 to 07:00:00 SHOULD BE, BUT AS SOON AS ANYTHING OTHER THAN 00:00:00 IS IN THE START TIME, ASIA HOTD/LOTD M15 WILL NO LONGER BE DISPLAYED TO ME ! NaN asia_lotd_m15 = asia_df.groupby( 'Date' )[[ 'Low_hourly' , 'High_hourly' ]].transform( 'min' ). min (axis = 1 ) asia_hotd_m15 = asia_df.groupby( 'Date' )[[ 'High_hourly' , 'Low_hourly' ]].transform( 'max' ). max (axis = 1 ) merged_df[ 'Asia LOTD M15' ] = asia_lotd_m15 merged_df[ 'Asia HOTD M15' ] = asia_hotd_m15 merged_df[ 'Asia HOTD LOTD Diff' ] = (merged_df[ 'Asia HOTD M15' ] - merged_df[ 'Asia LOTD M15' ]) / 0.01 index_asia_hotd_m15 = asia_df.groupby( 'Date' )[[ 'Low_hourly' , 'High_hourly' ]].idxmax().values.flatten() index_asia_lotd_m15 = asia_df.groupby( 'Date' )[[ 'High_hourly' , 'Low_hourly' ]].idxmin().values.flatten() time_asia_hotd_m15 = merged_df.loc[index_asia_hotd_m15, [ 'Date' , 'Time' ]].values time_asia_lotd_m15 = merged_df.loc[index_asia_lotd_m15, [ 'Date' , 'Time' ]].values time_asia_hotd_m15_df = pd.DataFrame(time_asia_hotd_m15, columns = [ 'Date' , 'Time_Asia_HOTD_M15' ]) time_asia_lotd_m15_df = pd.DataFrame(time_asia_lotd_m15, columns = [ 'Date' , 'Time_Asia_LOTD_M15' ]) merged_df = pd.merge(merged_df, time_asia_hotd_m15_df, on = 'Date' , how = 'left' ) merged_df = pd.merge(merged_df, time_asia_lotd_m15_df, on = 'Date' , how = 'left' ) df_news = pd.read_csv(r "C:\Users\xxxx\OneDrive\Desktop\BACKTEST\News\neue_news_eet_formatted2.csv" , sep = ';' , header = None ) df_news = df_news.rename(columns = { 0 : 'Date' , 1 : 'Time' , 2 : 'Currency' , 3 : 'Significance' , 4 : 'NewsName' }) filtered_news = df_news[(df_news[ 'Currency' ].isin([ 'JPY' , 'AUD' ])) & (df_news[ 'Significance' ].isin([ 'N' , 'M' , 'H' ]))] grouped_news = filtered_news.groupby( 'Date' , as_index = False ) merged_df = pd.merge(merged_df, grouped_news.agg({ 'NewsName' : lambda x: ';' .join(x), 'Significance' : lambda x: ';' .join(x)}), on = 'Date' , how = 'left' ) merged_df[ 'NewsName' ].replace('', pd.NA, inplace = True ) merged_df[ 'Significance' ].replace('', pd.NA, inplace = True ) merged_df[ 'Weekday' ] = pd.to_datetime(merged_df[ 'Date' ], format = '%d/%m/%Y' ).dt.strftime( '%A' ) merged_df = merged_df.drop_duplicates(subset = [ 'Date' ]) merged_df[ 'Entry Count' ] = range ( 1 , len (merged_df) + 1 ) merged_df.set_index( 'Entry Count' , inplace = True ) merged_df.loc[merged_df[ 'Time_HOTD_M15' ] < merged_df[ 'Time_LOTD_M15' ], 'Signal' ] = 'Bearish' merged_df.loc[merged_df[ 'Time_HOTD_M15' ] > merged_df[ 'Time_LOTD_M15' ], 'Signal' ] = 'Bullish' merged_df.loc[merged_df[ 'Time_HOTD_M15' ] = = merged_df[ 'Time_LOTD_M15' ], 'Signal' ] = 'Neutral' pd.set_option( 'display.max_rows' , 100 ) print (merged_df[[ 'Date' , 'Weekday' , 'Signal' , 'HOTD (D1)' , 'LOTD (D1)' , 'HOTD (M15)' , 'LOTD (M15)' , 'M15_HOTD_LOTD_Diff' , 'Time_HOTD_M15' , 'Time_LOTD_M15' , 'Asia HOTD M15' , 'Asia LOTD M15' , 'Asia HOTD LOTD Diff' , 'Time_Asia_HOTD_M15' , 'Time_Asia_LOTD_M15' , 'NewsName' , 'Significance' ]]) |
Depending on whether I swap low_hourly and high_hourly positions, different values come out at the end. As it is now, everything comes out correctly, but the code should actually universally calculate the time for the highest value that could be found from Low_hourly/High_hourly and the time from the lowest value that could be found from Low_hourly/High_hourly, regardless of the order?
1 2 |
index_hotd_m15 = merged_df.groupby( 'Date' )[[ 'Low_hourly' , 'High_hourly' ]].idxmax().values.flatten() index_lotd_m15 = merged_df.groupby( 'Date' )[[[ 'High_hourly' , 'Low_hourly' ]].idxmin().values.flatten() |
Thanks in advance and sorry for the long text!
Larz60+ write Jan-12-2024, 10:01 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
This has been added for you this time. Please use BBCode tags on future posts.
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
This has been added for you this time. Please use BBCode tags on future posts.