Python Forum

Full Version: How can i get the top smallest and top highest value for each row in pandas?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have data that I have sorted, below is a sample of the data:

Output:
missing_values count 0 51 3 1 12 12 13 1 15 1 16 1 21 1 35 2 36 3 40 1
I have the following code:

# Get the vount of each missing value
missing_value_count = missing_values.iloc[:, 0:1].value_counts().to_frame()
missing_value_count.sort_index(inplace=True)

missing_value_count.to_csv('question.csv')
        
missing_value_count.agg(lambda s: pd.Series([*s.nlargest().index, *s.nsmallest().index],
                           ['missing_values']),
                           axis='columns')
When I run the code I get the following error:

Output:
missing_value_count.agg(lambda s: pd.Series([*s.nlargest().index, *s.nsmallest().index], ['missing_values']), axis='columns') Traceback (most recent call last): Cell In[29], line 1 missing_value_count.agg(lambda s: pd.Series([*s.nlargest().index, *s.nsmallest().index], File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\frame.py:9196 in aggregate result = op.agg() File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\apply.py:699 in agg result = self.obj.apply(self.orig_f, axis, args=self.args, **self.kwargs) File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\frame.py:9423 in apply return op.apply().__finalize__(self, method="apply") File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\apply.py:678 in apply return self.apply_standard() File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\apply.py:798 in apply_standard results, res_index = self.apply_series_generator() File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\apply.py:814 in apply_series_generator results[i] = self.f(v) Cell In[29], line 1 in <lambda> missing_value_count.agg(lambda s: pd.Series([*s.nlargest().index, *s.nsmallest().index], File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\series.py:500 in __init__ com.require_length_match(data, index) File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\common.py:576 in require_length_match raise ValueError( ValueError: Length of values (2) does not match length of index (1)
I want to return the lowest value in missing_values with the highest values in count. So in the above data the result will be

Output:
missing_values count 0 51
How can I modify this part of the code to get the result I want?

missing_value_count.agg(lambda s: pd.Series([*s.nlargest().index, *s.nsmallest().index],
                           ['missing_values']),
                           axis='columns')
Not really sure what you want to do, maybe this will help (or may be not?):

import pandas as pd
import numpy as np

# make a df
df = pd.DataFrame(np.random.randint(0,100,size=(20, 4)), columns=list('ABCD'))
# write the minimum of A,B, C, D to 'Minimum_in_row'
df['Minimum_in_row'] = df.loc[:, ['A', 'B', 'C', 'D']].min(axis=1)
# write the maximum of A,B, C, D to 'Maximum_in_row'
df['Maximum_in_row'] = df.loc[:, ['A', 'B', 'C', 'D']].max(axis=1)
# to find all rows with NaN in df1 and count them
cols = list(df1.columns.values)
df1['number_of_NaNs'] = df1[cols].isna().sum(1)
If that is no help, post your source data!
I think you want to do something with df1
import pandas as pd

data = {
    'missing_values': [0, 3, 12, 13, 15, 16, 21, 35, 36, 40],
    'count': [51, 1, 12, 1, 1, 1, 1, 2, 3, 1]
}

df = pd.DataFrame(data)
max_count_idx = df['count'].idxmax()
# Select the row with the maximum count
result = df.loc[max_count_idx]
# Convert the result to a DataFrame
result_df = result.to_frame().T
>>> result
missing_values     0
count             51
Name: 0, dtype: int64

>>> result_df
   missing_values  count
0               0     51
Without the input csv, I am not sure, but I think OP wants this:

Quote:I want to return the lowest value in missing_values

df['missing_values'].min()
plus:

Quote:the highest values in count.

df['count'].max()
If that is correct, then

wanted = {
    'missing_values': [df['missing_values'].min()],
    'count': [df['count'].max()]
    }
and the result would be:

resultdf = pd.DataFrame(wanted)
Gives:

Output:
missing_values count 0 0 51
I think that can be streamlined a bit!

resultdf = pd.DataFrame([[df['missing_values'].min(), df['count'].max()]], columns=['missing_values','count'])
To find the smallest and largest values in each row of a pandas DataFrame, you can use the min() and max() functions along the appropriate axis. Here's how:

python
Copy code
import pandas as pd

# Create a sample DataFrame
data = {'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]}

df = pd.DataFrame(data)

# Find the smallest value in each row
smallest_values = df.min(axis=1)

# Find the largest value in each row
largest_values = df.max(axis=1)

print("Smallest values in each row:")
print(smallest_values)

print("\nLargest values in each row:")
print(largest_values)
This will output the smallest and largest values for each row in the DataFrame.