Python Forum

I have data that I have sorted, below is a sample of the data:

Output:    missing_values	count
    0		                51
    3		                1
    12		                12
    13		                1
    15		                1
    16		                1
    21		                1
    35		                2
    36		                3
    40		                1

I have the following code:

# Get the vount of each missing value
missing_value_count = missing_values.iloc[:, 0:1].value_counts().to_frame()
missing_value_count.sort_index(inplace=True)

missing_value_count.to_csv('question.csv')
        
missing_value_count.agg(lambda s: pd.Series([*s.nlargest().index, *s.nsmallest().index],
                           ['missing_values']),
                           axis='columns')

When I run the code I get the following error:

Output:missing_value_count.agg(lambda s: pd.Series([*s.nlargest().index, *s.nsmallest().index],
                           ['missing_values']),
                           axis='columns')
    Traceback (most recent call last):

      Cell In[29], line 1
    missing_value_count.agg(lambda s: pd.Series([*s.nlargest().index, *s.nsmallest().index],

      File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\frame.py:9196 in aggregate
    result = op.agg()

      File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\apply.py:699 in agg
    result = self.obj.apply(self.orig_f, axis, args=self.args, **self.kwargs)

      File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\frame.py:9423 in apply
    return op.apply().__finalize__(self, method="apply")

      File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\apply.py:678 in apply
    return self.apply_standard()

      File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\apply.py:798 in apply_standard
    results, res_index = self.apply_series_generator()

      File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\apply.py:814 in apply_series_generator
    results[i] = self.f(v)

      Cell In[29], line 1 in <lambda>
    missing_value_count.agg(lambda s: pd.Series([*s.nlargest().index, *s.nsmallest().index],

      File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\series.py:500 in __init__
    com.require_length_match(data, index)

      File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\common.py:576 in require_length_match
    raise ValueError(

    ValueError: Length of values (2) does not match length of index (1)

I want to return the lowest value in missing_values with the highest values in count. So in the above data the result will be

Output:    missing_values      count
    0                          51

How can I modify this part of the code to get the result I want?

missing_value_count.agg(lambda s: pd.Series([*s.nlargest().index, *s.nsmallest().index],
                           ['missing_values']),
                           axis='columns')

Not really sure what you want to do, maybe this will help (or may be not?):

import pandas as pd
import numpy as np

# make a df
df = pd.DataFrame(np.random.randint(0,100,size=(20, 4)), columns=list('ABCD'))
# write the minimum of A,B, C, D to 'Minimum_in_row'
df['Minimum_in_row'] = df.loc[:, ['A', 'B', 'C', 'D']].min(axis=1)
# write the maximum of A,B, C, D to 'Maximum_in_row'
df['Maximum_in_row'] = df.loc[:, ['A', 'B', 'C', 'D']].max(axis=1)
# to find all rows with NaN in df1 and count them
cols = list(df1.columns.values)
df1['number_of_NaNs'] = df1[cols].isna().sum(1)

If that is no help, post your source data!
I think you want to do something with df1

import pandas as pd

data = {
    'missing_values': [0, 3, 12, 13, 15, 16, 21, 35, 36, 40],
    'count': [51, 1, 12, 1, 1, 1, 1, 2, 3, 1]
}

df = pd.DataFrame(data)
max_count_idx = df['count'].idxmax()
# Select the row with the maximum count
result = df.loc[max_count_idx]
# Convert the result to a DataFrame
result_df = result.to_frame().T

>>> result
missing_values     0
count             51
Name: 0, dtype: int64

>>> result_df
   missing_values  count
0               0     51

Without the input csv, I am not sure, but I think OP wants this:

Quote:I want to return the lowest value in missing_values

df['missing_values'].min()

plus:

Quote:the highest values in count.

df['count'].max()

If that is correct, then

wanted = {
    'missing_values': [df['missing_values'].min()],
    'count': [df['count'].max()]
    }

and the result would be:

resultdf = pd.DataFrame(wanted)

Gives:

Output:   missing_values  count
0               0     51

I think that can be streamlined a bit!

resultdf = pd.DataFrame([[df['missing_values'].min(), df['count'].max()]], columns=['missing_values','count'])

To find the smallest and largest values in each row of a pandas DataFrame, you can use the min() and max() functions along the appropriate axis. Here's how:

python
Copy code
import pandas as pd

# Create a sample DataFrame
data = {'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]}

df = pd.DataFrame(data)

# Find the smallest value in each row
smallest_values = df.min(axis=1)

# Find the largest value in each row
largest_values = df.max(axis=1)

print("Smallest values in each row:")
print(smallest_values)

print("\nLargest values in each row:")
print(largest_values)
This will output the smallest and largest values for each row in the DataFrame.

mahmoudr899

Pedroski55

snippsat

Pedroski55

marythodge4