Python Forum
Why replace treats an integer value 999 as 999.0?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Why replace treats an integer value 999 as 999.0?
#1
Hi, I came across an example to replace sentinel values -999 by NaN. -999 is an integer. How come it can be used to replace those -999.0 elements in the Series?

import pandas as pd

In [39]: data = pd.Series([1., -999., 2., -999., -1000., 3.])                                                         

In [40]: data                                                                                                         
Out[40]: 
0       1.0
1    -999.0
2       2.0
3    -999.0
4   -1000.0
5       3.0
dtype: float64

In [41]: data.replace(-999, np.nan)                                                                                   
Out[41]: 
0       1.0
1       NaN
2       2.0
3       NaN
4   -1000.0
5       3.0
dtype: float64
I did a test by typing:
-999 == -999.
The interpreter gave True. Does the method "replace" automatically did type casting implicitely to convert decimal values to the nearest integer? I can't find this mentioned in
 ?pd.Series.replace 
I have seen somewhere else that an integer (e.g. 3) is used interchangeably with its decimal counterpart (e.g. 3.). Why? integer 3 is not the same as 3.000000000000000000001. Is there a rule here?
Reply
#2
I haven't used pandas, but what == does is test if both literals have the same binary representation, which is why it's not a good idea to use it to test float equality with it after some calculations (not your case right now, of course). Now, seeing this, I am assuming that 999. and 999 actually have the same internal binary representation. This would be a very interesting thing to check. Try it if it's possible.

Probably the replace() function is being affected by this.
Reply
#3
It's just replace value,dos not convert/cast to anything else.
So -999 can eg be replaced bye a string.
>>> data = pd.Series([1., -999., 2., -999., -1000., 3.]) 
>>> data.replace(-999, 'hello')
0        1
1    hello
2        2
3    hello
4    -1000
5        3
dtype: object
The Series detect now that some values are string and set dtype to object(str or mixed).
new_to_python Wrote:Why? integer 3 is not the same as 3.000000000000000000001. Is there a rule here?
They are are not the same as integer and float have different characteristic.
They will be equal if compare as they are close.
>>> 3 == 3.0000000000000001
True
>>> 3 is 3.0000000000000001
False
>>> 3 == 3.0000001
False
Look at floating point arithmetic Basic Answers
>>> 0.1 * 3
0.30000000000000004
Reply
#4
Thanks. I think I am not sure about two things.

Quote:>>> data = pd.Series([1., -999., 2., -999., -1000., 3.])
>>> data.replace(-999, 'hello')

The interpreter uses -999, the first element of the replace method, as a key to match the elements in the series called data to determine which element(s) in the series to be replaced. The second element in the series is -999. (which is the same as -999.0) which is not the same as the integer -999 as -999.0 is a float. By "They will be equal if compare as they are close.", in python by how close do they have to be in order to be considered equal? Can the programmer set the tolerance/threshold?

In C, I often compare two floating point/double values using:

if (fabs(a-b) < threshold)
{
printf("The two floating point values are the same\n");
}

where the threshold can be 1e-6 or 1e-8 or whatever depending on the applications which can be experimentally pre-determined. So in Python, I don't need to do that explicitly and as long as two numbers (could both be float/double or one float/double and one integer) are close, they are considered to be the same?
Reply
#5
(Feb-09-2020, 08:45 PM)new_to_python Wrote: So in Python, I don't need to do that explicitly and as long as two numbers (could both be float/double or one float/double and one integer) are close, they are considered to be the same?
Has to be very close if remove one 0 and it's False.
>>> 3 == 3.000000000000001
False
For better control on how close look at math.isclose() numpy also have numpy.isclose.
There also a decimal module where can better control over precision,
can eg do financial calculation or as a calculator would output.
>>> 0.1 * 3
0.30000000000000004
>>> from decimal import Decimal
>>> 
>>> result = Decimal('0.1') * Decimal('3')
>>> result
Decimal('0.3')
>>> print(result)
0.3
Pandas is a own big beast and can have other rules Undecided
Reply
#6
About why you are getting True when doing -999. == -999. Note that, although these two literals have different type, their actual value (binary representation) is the same, and that's what == checks. When you do 3.000000000000000000000000001 == 3, you'll get True, because the floating point precision is limited. To check this, you can go to https://www.h-schmidt.net/FloatConverter/IEEE754.html and do some tests.
Reply
#7
(Feb-10-2020, 06:06 AM)karkas Wrote: About why you are getting True when doing -999. == -999. Note that, although these two literals have different type, their actual value (binary representation) is the same, and that's what == checks. When you do 3.000000000000000000000000001 == 3, you'll get True, because the floating point precision is limited. To check this, you can go to https://www.h-schmidt.net/FloatConverter/IEEE754.html and do some tests.

Thank you. Is there a way to display the binary representation of numbers in Python/pandas?
Reply
#8
import struct


def double_to_bin(value):
    """
    Convert an float to double (64 bit) binary representation
    Please check here: http://www.binaryconvert.com/result_double.html

    """
    value_sum = sum(
        value << (shift * 8)
        for shift, value in
        enumerate(struct.pack('<d', value))
    )
    bin_str = f'{value_sum:064b}'
    return {
        'dec': value,
        'hex': f'{value_sum:08x}',
        'bin': bin_str,
        'sign': bin_str[0] == '1',
        'exponent': bin_str[1:12],
        'mantissa': bin_str[12:],
    }
Better you check it online if the result is right.
I checked it with 1.0 and 0.3.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#9
(Feb-11-2020, 02:54 PM)DeaD_EyE Wrote:
import struct


def double_to_bin(value):
    """
    Convert an float to double (64 bit) binary representation
    Please check here: http://www.binaryconvert.com/result_double.html

    """
    value_sum = sum(
        value << (shift * 8)
        for shift, value in
        enumerate(struct.pack('<d', value))
    )
    bin_str = f'{value_sum:064b}'
    return {
        'dec': value,
        'hex': f'{value_sum:08x}',
        'bin': bin_str,
        'sign': bin_str[0] == '1',
        'exponent': bin_str[1:12],
        'mantissa': bin_str[12:],
    }
Better you check it online if the result is right.
I checked it with 1.0 and 0.3.


Thanks. In this case, 999 and 999.0 produces the same binary representation which is:

'0100000010001111001110000000000000000000000000000000000000000000'

So python treats them as the same?
Reply
#10
Hi DeaD_EyE, the binary representation of 999 is: 1111100111

How come your code produced a very very long binary representation which is different from 1111100111?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Search & Replace - Newlines Added After Replace dj99 3 3,354 Jul-22-2018, 01:42 PM
Last Post: buran
  Using a variable to replace an integer? (Except it isn't working!) s1monsays 15 8,241 Jul-25-2017, 06:58 PM
Last Post: s1monsays

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020