Why replace treats an integer value 999 as 999.0? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Why replace treats an integer value 999 as 999.0? (/thread-24333.html) Pages:
1
2
|
Why replace treats an integer value 999 as 999.0? - new_to_python - Feb-09-2020 Hi, I came across an example to replace sentinel values -999 by NaN. -999 is an integer. How come it can be used to replace those -999.0 elements in the Series? import pandas as pd In [39]: data = pd.Series([1., -999., 2., -999., -1000., 3.]) In [40]: data Out[40]: 0 1.0 1 -999.0 2 2.0 3 -999.0 4 -1000.0 5 3.0 dtype: float64 In [41]: data.replace(-999, np.nan) Out[41]: 0 1.0 1 NaN 2 2.0 3 NaN 4 -1000.0 5 3.0 dtype: float64I did a test by typing: -999 == -999.The interpreter gave True. Does the method "replace" automatically did type casting implicitely to convert decimal values to the nearest integer? I can't find this mentioned in ?pd.Series.replaceI have seen somewhere else that an integer (e.g. 3) is used interchangeably with its decimal counterpart (e.g. 3.). Why? integer 3 is not the same as 3.000000000000000000001. Is there a rule here? RE: Why replace treats an integer value 999 as 999.0? - karkas - Feb-09-2020 I haven't used pandas, but what == does is test if both literals have the same binary representation, which is why it's not a good idea to use it to test float equality with it after some calculations (not your case right now, of course). Now, seeing this, I am assuming that 999. and 999 actually have the same internal binary representation. This would be a very interesting thing to check. Try it if it's possible. Probably the replace() function is being affected by this.
RE: Why replace treats an integer value 999 as 999.0? - snippsat - Feb-09-2020 It's just replace value,dos not convert/cast to anything else. So -999 can eg be replaced bye a string. >>> data = pd.Series([1., -999., 2., -999., -1000., 3.]) >>> data.replace(-999, 'hello') 0 1 1 hello 2 2 3 hello 4 -1000 5 3 dtype: objectThe Series detect now that some values are string and set dtype to object (str or mixed).new_to_python Wrote:Why? integer 3 is not the same as 3.000000000000000000001. Is there a rule here?They are are not the same as integer and float have different characteristic. They will be equal if compare as they are close. >>> 3 == 3.0000000000000001 True >>> 3 is 3.0000000000000001 False >>> 3 == 3.0000001 FalseLook at floating point arithmetic Basic Answers >>> 0.1 * 3 0.30000000000000004 RE: Why replace treats an integer value 999 as 999.0? - new_to_python - Feb-09-2020 Thanks. I think I am not sure about two things. Quote:>>> data = pd.Series([1., -999., 2., -999., -1000., 3.]) The interpreter uses -999, the first element of the replace method, as a key to match the elements in the series called data to determine which element(s) in the series to be replaced. The second element in the series is -999. (which is the same as -999.0) which is not the same as the integer -999 as -999.0 is a float. By "They will be equal if compare as they are close.", in python by how close do they have to be in order to be considered equal? Can the programmer set the tolerance/threshold? In C, I often compare two floating point/double values using: if (fabs(a-b) < threshold) { printf("The two floating point values are the same\n"); } where the threshold can be 1e-6 or 1e-8 or whatever depending on the applications which can be experimentally pre-determined. So in Python, I don't need to do that explicitly and as long as two numbers (could both be float/double or one float/double and one integer) are close, they are considered to be the same? RE: Why replace treats an integer value 999 as 999.0? - snippsat - Feb-09-2020 (Feb-09-2020, 08:45 PM)new_to_python Wrote: So in Python, I don't need to do that explicitly and as long as two numbers (could both be float/double or one float/double and one integer) are close, they are considered to be the same?Has to be very close if remove one 0 and it's False. >>> 3 == 3.000000000000001 FalseFor better control on how close look at math.isclose() numpy also have numpy.isclose. There also a decimal module where can better control over precision, can eg do financial calculation or as a calculator would output. >>> 0.1 * 3 0.30000000000000004 >>> from decimal import Decimal >>> >>> result = Decimal('0.1') * Decimal('3') >>> result Decimal('0.3') >>> print(result) 0.3Pandas is a own big beast and can have other rules RE: Why replace treats an integer value 999 as 999.0? - karkas - Feb-10-2020 About why you are getting True when doing -999. == -999 . Note that, although these two literals have different type, their actual value (binary representation) is the same, and that's what == checks. When you do 3.000000000000000000000000001 == 3 , you'll get True , because the floating point precision is limited. To check this, you can go to https://www.h-schmidt.net/FloatConverter/IEEE754.html and do some tests.
RE: Why replace treats an integer value 999 as 999.0? - new_to_python - Feb-11-2020 (Feb-10-2020, 06:06 AM)karkas Wrote: About why you are getting Thank you. Is there a way to display the binary representation of numbers in Python/pandas? RE: Why replace treats an integer value 999 as 999.0? - DeaD_EyE - Feb-11-2020 import struct def double_to_bin(value): """ Convert an float to double (64 bit) binary representation Please check here: http://www.binaryconvert.com/result_double.html """ value_sum = sum( value << (shift * 8) for shift, value in enumerate(struct.pack('<d', value)) ) bin_str = f'{value_sum:064b}' return { 'dec': value, 'hex': f'{value_sum:08x}', 'bin': bin_str, 'sign': bin_str[0] == '1', 'exponent': bin_str[1:12], 'mantissa': bin_str[12:], }Better you check it online if the result is right. I checked it with 1.0 and 0.3. RE: Why replace treats an integer value 999 as 999.0? - new_to_python - Feb-11-2020 (Feb-11-2020, 02:54 PM)DeaD_EyE Wrote:import struct def double_to_bin(value): """ Convert an float to double (64 bit) binary representation Please check here: http://www.binaryconvert.com/result_double.html """ value_sum = sum( value << (shift * 8) for shift, value in enumerate(struct.pack('<d', value)) ) bin_str = f'{value_sum:064b}' return { 'dec': value, 'hex': f'{value_sum:08x}', 'bin': bin_str, 'sign': bin_str[0] == '1', 'exponent': bin_str[1:12], 'mantissa': bin_str[12:], }Better you check it online if the result is right. Thanks. In this case, 999 and 999.0 produces the same binary representation which is: '0100000010001111001110000000000000000000000000000000000000000000' So python treats them as the same? RE: Why replace treats an integer value 999 as 999.0? - new_to_python - Feb-14-2020 Hi DeaD_EyE, the binary representation of 999 is: 1111100111 How come your code produced a very very long binary representation which is different from 1111100111? |