Posts: 39
Threads: 12
Joined: Feb 2020
Feb092020, 05:49 PM
(This post was last modified: Feb092020, 05:49 PM by new_to_python.)
Hi, I came across an example to replace sentinel values 999 by NaN. 999 is an integer. How come it can be used to replace those 999.0 elements in the Series?
import pandas as pd
In [39]: data = pd.Series([1., 999., 2., 999., 1000., 3.])
In [40]: data
Out[40]:
0 1.0
1 999.0
2 2.0
3 999.0
4 1000.0
5 3.0
dtype: float64
In [41]: data.replace(999, np.nan)
Out[41]:
0 1.0
1 NaN
2 2.0
3 NaN
4 1000.0
5 3.0
dtype: float64 I did a test by typing:
999 == 999. The interpreter gave True. Does the method "replace" automatically did type casting implicitely to convert decimal values to the nearest integer? I can't find this mentioned in ?pd.Series.replace I have seen somewhere else that an integer (e.g. 3) is used interchangeably with its decimal counterpart (e.g. 3.). Why? integer 3 is not the same as 3.000000000000000000001. Is there a rule here?
Posts: 22
Threads: 5
Joined: Aug 2019
I haven't used pandas, but what == does is test if both literals have the same binary representation, which is why it's not a good idea to use it to test float equality with it after some calculations (not your case right now, of course). Now, seeing this, I am assuming that 999. and 999 actually have the same internal binary representation. This would be a very interesting thing to check. Try it if it's possible.
Probably the replace() function is being affected by this.
Posts: 5,642
Threads: 112
Joined: Sep 2016
It's just replace value,dos not convert/cast to anything else.
So 999 can eg be replaced bye a string.
>>> data = pd.Series([1., 999., 2., 999., 1000., 3.])
>>> data.replace(999, 'hello')
0 1
1 hello
2 2
3 hello
4 1000
5 3
dtype: object The Series detect now that some values are string and set dtype to object (str or mixed).
new_to_python Wrote:Why? integer 3 is not the same as 3.000000000000000000001. Is there a rule here? They are are not the same as integer and float have different characteristic.
They will be equal if compare as they are close.
>>> 3 == 3.0000000000000001
True
>>> 3 is 3.0000000000000001
False
>>> 3 == 3.0000001
False Look at floating point arithmetic Basic Answers
>>> 0.1 * 3
0.30000000000000004
Posts: 39
Threads: 12
Joined: Feb 2020
Feb092020, 08:45 PM
(This post was last modified: Feb092020, 08:45 PM by new_to_python.)
Thanks. I think I am not sure about two things.
Quote:>>> data = pd.Series([1., 999., 2., 999., 1000., 3.])
>>> data.replace(999, 'hello')
The interpreter uses 999, the first element of the replace method, as a key to match the elements in the series called data to determine which element(s) in the series to be replaced. The second element in the series is 999. (which is the same as 999.0) which is not the same as the integer 999 as 999.0 is a float. By "They will be equal if compare as they are close.", in python by how close do they have to be in order to be considered equal? Can the programmer set the tolerance/threshold?
In C, I often compare two floating point/double values using:
if (fabs(ab) < threshold)
{
printf("The two floating point values are the same\n");
}
where the threshold can be 1e6 or 1e8 or whatever depending on the applications which can be experimentally predetermined. So in Python, I don't need to do that explicitly and as long as two numbers (could both be float/double or one float/double and one integer) are close, they are considered to be the same?
Posts: 5,642
Threads: 112
Joined: Sep 2016
Feb092020, 11:15 PM
(This post was last modified: Feb092020, 11:15 PM by snippsat.)
(Feb092020, 08:45 PM)new_to_python Wrote: So in Python, I don't need to do that explicitly and as long as two numbers (could both be float/double or one float/double and one integer) are close, they are considered to be the same? Has to be very close if remove one 0 and it's False.
>>> 3 == 3.000000000000001
False For better control on how close look at math.isclose() numpy also have numpy.isclose.
There also a decimal module where can better control over precision,
can eg do financial calculation or as a calculator would output.
>>> 0.1 * 3
0.30000000000000004 >>> from decimal import Decimal
>>>
>>> result = Decimal('0.1') * Decimal('3')
>>> result
Decimal('0.3')
>>> print(result)
0.3 Pandas is a own big beast and can have other rules
Posts: 22
Threads: 5
Joined: Aug 2019
Feb102020, 06:06 AM
(This post was last modified: Feb102020, 06:07 AM by karkas.)
About why you are getting True when doing 999. == 999 . Note that, although these two literals have different type, their actual value (binary representation) is the same, and that's what == checks. When you do 3.000000000000000000000000001 == 3 , you'll get True , because the floating point precision is limited. To check this, you can go to https://www.hschmidt.net/FloatConverter/IEEE754.html and do some tests.
Posts: 39
Threads: 12
Joined: Feb 2020
(Feb102020, 06:06 AM)karkas Wrote: About why you are getting True when doing 999. == 999 . Note that, although these two literals have different type, their actual value (binary representation) is the same, and that's what == checks. When you do 3.000000000000000000000000001 == 3 , you'll get True , because the floating point precision is limited. To check this, you can go to https://www.hschmidt.net/FloatConverter/IEEE754.html and do some tests.
Thank you. Is there a way to display the binary representation of numbers in Python/pandas?
Posts: 1,666
Threads: 6
Joined: May 2017
import struct
def double_to_bin(value):
"""
Convert an float to double (64 bit) binary representation
Please check here: http://www.binaryconvert.com/result_double.html
"""
value_sum = sum(
value << (shift * 8)
for shift, value in
enumerate(struct.pack('<d', value))
)
bin_str = f'{value_sum:064b}'
return {
'dec': value,
'hex': f'{value_sum:08x}',
'bin': bin_str,
'sign': bin_str[0] == '1',
'exponent': bin_str[1:12],
'mantissa': bin_str[12:],
} Better you check it online if the result is right.
I checked it with 1.0 and 0.3.
My code examples are always for Python >=3.6.0
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Posts: 39
Threads: 12
Joined: Feb 2020
Feb112020, 07:49 PM
(This post was last modified: Feb112020, 07:50 PM by new_to_python.)
(Feb112020, 02:54 PM)DeaD_EyE Wrote: import struct
def double_to_bin(value):
"""
Convert an float to double (64 bit) binary representation
Please check here: http://www.binaryconvert.com/result_double.html
"""
value_sum = sum(
value << (shift * 8)
for shift, value in
enumerate(struct.pack('<d', value))
)
bin_str = f'{value_sum:064b}'
return {
'dec': value,
'hex': f'{value_sum:08x}',
'bin': bin_str,
'sign': bin_str[0] == '1',
'exponent': bin_str[1:12],
'mantissa': bin_str[12:],
} Better you check it online if the result is right.
I checked it with 1.0 and 0.3.
Thanks. In this case, 999 and 999.0 produces the same binary representation which is:
'0100000010001111001110000000000000000000000000000000000000000000'
So python treats them as the same?
Posts: 39
Threads: 12
Joined: Feb 2020
Hi DeaD_EyE, the binary representation of 999 is: 1111100111
How come your code produced a very very long binary representation which is different from 1111100111?
