Python Forum

Hi,
I want to calculate column mean and row mean & skip "na" and "non-numeric". I use below code, but it gives some warning as show below:

__main__:5: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.

import pandas as pd
df = pd.read_csv(r"D:\Data\PythonCodes\outsummary2.csv")


print((df[['s2']].convert_objects(convert_numeric=True)).mean(skipna=True))

my csv data:

	s1	s2	s3
A	1	2	1
B	na	5	3
U	1	na	0
Z	0		
Z	0	2	2

is there any efficient way to skip "non-numeric" and "na", while calculating column mean (or sum)

pd.to_numeric(df['s2'], errors='coerce').mean()

It works, Thanks a lot. Can we get the row index of actual numeric values?

>>> df
    s1
0   1
1  na
2   1
3   0
4   0
>>> for i, num in df['s1'].iteritems():
...   if type(num) is int:
...     print(i)
...
0
2
3
4

import pandas as pd
df = pd.read_csv(r"D:\Data\outsummary2.csv")
df2 = df.convert_objects(convert_numeric=True)
for i, num in df2['s2'].iteritems():
    print(num, type(num))
    if type(num) is int:
        print(i)

Output:
it always float type.
2.0 <class 'numpy.float64'>
5.0 <class 'numpy.float64'>
nan <class 'numpy.float64'>
nan <class 'numpy.float64'>
2.0 <class 'numpy.float64'>

>>> import numpy as np
>>> df2
0    2.0
1    5.0
2    NaN
3    2.0
Name: s2, dtype: float64
>>> for i, num in df2.iteritems():
...   if not np.isnan(num):
...     print(num)
...
2.0
5.0
2.0

Mekala

anbu23

Mekala

anbu23

Mekala

anbu23