Python Forum
Pandas replace function not working on datafram with floats - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Pandas replace function not working on datafram with floats (/thread-33283.html)



Pandas replace function not working on datafram with floats - bcrypto - Apr-12-2021

Hi,

I am using Pandas' replace function and it works on one DataFrame but not on another and I don't understand why. I have tried many different solutions but nothing seems to work - Please take a look at the example code below:

import pandas as pd
import talib as ta

rslt_df = pd.DataFrame()
test_df = pd.DataFrame()

df = pd.DataFrame({'Close': [1, 2.3842, 5.389132, 4.09712373, 5.90845, 6.0981234, 7, 8, 9]})

# ROC-------------------------------------------------
for i in range(2):
    if i != 0:
        rslt_df['roc%i' % i] = ta.ROC(df['Close'].shift(1), timeperiod=i)

# replace rtest_df
print("rslt_df:\n", rslt_df)
print("dtypes= ", rslt_df.dtypes)
# rslt_df = rslt_df.fillna(0)
rslt_df = rslt_df.replace(to_replace=[138.420000, 126.035232, 44.209704], value=[11111, 22222, 33333])  # This one does NOT replace all of them
print("rslt_df:\n", rslt_df)

# replace roc df---------------------
print("dtypes2= ", test_df.dtypes)
test_df = df.replace(to_replace=[1, 9], value=[33333, 44444])

print("test_df:\n", test_df)
# Output
rslt_df:
          roc1
0         NaN
1         NaN
2  138.420000
3  126.035232
4  -23.974330
5   44.209704
6    3.210206
7   14.789412
8   14.285714
dtypes=  roc1    float64
dtype: object
rslt_df:
            roc1
0           NaN
1           NaN
2  11111.000000
3    126.035232
4    -23.974330
5     44.209704
6      3.210206
7     14.789412
8     14.285714
dtypes2=  Series([], dtype: object)
test_df:
           Close
0  33333.000000
1      2.384200
2      5.389132
3      4.097124
4      5.908450
5      6.098123
6      7.000000
7      8.000000
8  44444.000000
* Update: I added two prints of the dtypes of each DataFrame and the one that is working is a Series[] and the one that is not working is a float64. Not sure if this is the problem and I don't know why the two are different but it feels like a little progress ;-/

**Update: I had a slight error in the code where I was not assigning rslt_df to the new replaced rslt_df, so I thought that was the issue but it does not work if I add floats into the picture. Notice the new output works when the value is a float that does not have too many decimals - so that has to be the issue! Ex: 138.420000 gets repalced but 126.035232 and 44.209704 do not.


RE: Pandas replace function not working on a specific dataFrame but working on another. - bcrypto - Apr-12-2021

Found the solution!

rslt_df = rslt_df.round(6) cleans up a rounding issues I believe and now it works:
import pandas as pd
import talib as ta
import numpy as np

rslt_df = pd.DataFrame()
test_df = pd.DataFrame()

df = pd.DataFrame({'Close': [1, 2.3842, 5.389132, 4.09712373, 5.90845, 6.0981234, 7, 8, 9]})

# ROC-------------------------------------------------
for i in range(2):
    if i != 0:
        rslt_df['roc%i' % i] = ta.ROC(df['Close'].shift(1), timeperiod=i)

# replace rtest_df
print("rslt_df:\n", rslt_df)
print("dtypes= ", rslt_df.dtypes)
rslt_df = rslt_df.fillna(0)

# rslt_df = rslt_df.roc1.astype(int)
rslt_df = rslt_df.round(6)  # *****SOLUTION*****


rslt_df = rslt_df.replace(to_replace=[138.42, 126.035232, 44.209704], value=[11111, 22222, 33333])  # This one does NOT replace

# rslt_df = rslt_df.mask(np.isclose(df.values, 126.035232))


print("rslt_df:\n", rslt_df)

# replace roc df---------------------
print("dtypes2= ", test_df.dtypes)
test_df = df.replace(to_replace=[1, 9], value=[33333, 44444])

print("test_df:\n", test_df)
# OUTPUT
rslt_df:
          roc1
0         NaN
1         NaN
2  138.420000
3  126.035232
4  -23.974330
5   44.209704
6    3.210206
7   14.789412
8   14.285714
dtypes=  roc1    float64
dtype: object
rslt_df:
            roc1
0      0.000000
1      0.000000
2  11111.000000
3  22222.000000
4    -23.974330
5  33333.000000
6      3.210206
7     14.789412
8     14.285714
dtypes2=  Series([], dtype: object)
test_df:
           Close
0  33333.000000
1      2.384200
2      5.389132
3      4.097124
4      5.908450
5      6.098123
6      7.000000
7      8.000000
8  44444.000000