Bottom Page

Thread Rating:
  • 3 Vote(s) - 2.67 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 How to define Missing value on some certain conditions?
Hello everyone.
I am working on a project that consist in finding missing data in a table generated from SQL server.
I am showing a simplified model which has only two customer, mine has hundreds.

Please check the image:

I need to build a function that does this(step by step):
1. Check IF either From date or To date is blank(for simplicity let’s say To date)
2. IF there is a blank cell than to check IF ‘Customer’ and ‘Account’ in this row are the same with the ones above and below (columns are sorted in ascending order). IF this is not true than to write “Missing” otherwise to make some deeper analysis.
3. IF the ‘Customer’ and ‘Account’ are the same than to check IF there is any continuity between dates. For example the first one, index[1]:
IF To date[0]=From Date[3] than write Not Missing otherwise Missing

This is my first project it Python and I will really appreciate if you would help me.

PS: Can this case be handled by using sklearn.preprocessing library?
I would suggest you to use Pandas package.
It is designed to handle the cases you just described.

sklearn.preprocessing is primarily about applying some transformations to data (numerical or categorical data), e.g. scaling, encoding etc. In your case you need to handle missing data, select data by conditions and change them. Pandas is designed exactly for that.
Thank you for your suggestion scidam!
I need some guidance if it possible. Can I handle this case by using while loop or some IF statements?
I am not sure that I did all right, but the following example I just wrote should handle
your task:
import pandas as pd

data = pd.DataFrame({'A': ['A'] * 10 + ['B'] * 7, # Customer
                    'B': ['C'] * 7 + ['D'] * 10, # Account
                    'From': [1, None, 2, None, 5, None, None, 7, None, 8, 10, 12, 14, None, 15, None, None], #From date
                    'To': [2, None, 3, None, 6, None, None, 8, None, 9, 11, 13, 15, None, 16, None, None],  # To date
                    'N': [None] * 17}) # Missing or Not missing, all values aren't defined by default

def fill_data(df):
    result = df.copy()
    both_empty = df.From.isnull() & df.To.isnull()
    result.loc[((df.From == df.To.shift(2)) & (~df.From.isnull()) & (~df.To.shift(2).isnull())).shift(-1) & both_empty, 'N'] = 'NM'
    result.loc[((df.From != df.To.shift(2)) & (~df.From.isnull()) & (~df.To.shift(2).isnull())).shift(-1) & both_empty, 'N'] = 'M'
    return result

result = data.groupby(['A', 'B']).apply(fill_data)
Note: You will need to convert 'From' and 'To' to datetime objects (see pd.to_datetime function).
In this example I am using numbers to fill these columns for simplicity.

Top Page

Forum Jump:

Users browsing this thread: 1 Guest(s)