Bottom Page

Thread Rating:
  • 3 Vote(s) - 2.67 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to define Missing value on some certain conditions?
#1
Hello everyone.
I am working on a project that consist in finding missing data in a table generated from SQL server.
I am showing a simplified model which has only two customer, mine has hundreds.

Please check the image:
İmage


I need to build a function that does this(step by step):
1. Check IF either From date or To date is blank(for simplicity let’s say To date)
2. IF there is a blank cell than to check IF ‘Customer’ and ‘Account’ in this row are the same with the ones above and below (columns are sorted in ascending order). IF this is not true than to write “Missing” otherwise to make some deeper analysis.
3. IF the ‘Customer’ and ‘Account’ are the same than to check IF there is any continuity between dates. For example the first one, index[1]:
IF To date[0]=From Date[3] than write Not Missing otherwise Missing

This is my first project it Python and I will really appreciate if you would help me.

PS: Can this case be handled by using sklearn.preprocessing library?
Quote
#2
I would suggest you to use Pandas package.
It is designed to handle the cases you just described.

sklearn.preprocessing is primarily about applying some transformations to data (numerical or categorical data), e.g. scaling, encoding etc. In your case you need to handle missing data, select data by conditions and change them. Pandas is designed exactly for that.
Quote
#3
Thank you for your suggestion scidam!
I need some guidance if it possible. Can I handle this case by using while loop or some IF statements?
Quote
#4
I am not sure that I did all right, but the following example I just wrote should handle
your task:
import pandas as pd

data = pd.DataFrame({'A': ['A'] * 10 + ['B'] * 7, # Customer
                    'B': ['C'] * 7 + ['D'] * 10, # Account
                    'From': [1, None, 2, None, 5, None, None, 7, None, 8, 10, 12, 14, None, 15, None, None], #From date
                    'To': [2, None, 3, None, 6, None, None, 8, None, 9, 11, 13, 15, None, 16, None, None],  # To date
                    'N': [None] * 17}) # Missing or Not missing, all values aren't defined by default


def fill_data(df):
    result = df.copy()
    both_empty = df.From.isnull() & df.To.isnull()
    result.loc[((df.From == df.To.shift(2)) & (~df.From.isnull()) & (~df.To.shift(2).isnull())).shift(-1) & both_empty, 'N'] = 'NM'
    result.loc[((df.From != df.To.shift(2)) & (~df.From.isnull()) & (~df.To.shift(2).isnull())).shift(-1) & both_empty, 'N'] = 'M'
    return result


result = data.groupby(['A', 'B']).apply(fill_data)
print(result.reset_index(drop=True))
Note: You will need to convert 'From' and 'To' to datetime objects (see pd.to_datetime function).
In this example I am using numbers to fill these columns for simplicity.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Missing something here Robpl1 2 88 Jan-14-2019, 10:35 AM
Last Post: Gribouillis
  help for Kaggle Titanic Set fill the missing Age by median age of Pclass and Sex Parthasarathi009 2 146 Nov-21-2018, 06:50 PM
Last Post: Parthasarathi009
  If conditions with time limit unknowntothem 4 217 Nov-09-2018, 08:59 PM
Last Post: nilamo
  edit text files/ add lines if missing (regex) wardancer84 3 188 Nov-08-2018, 02:47 PM
Last Post: wardancer84
  go_to() missing 1 required positional argument: 'url' HMAeymo 1 452 Aug-15-2018, 08:38 PM
Last Post: micseydel
  lxml missing libxml2, libxslt cawtx 3 1,089 Jul-13-2018, 11:49 PM
Last Post: snippsat
  SyntaxError: Missing parentheses in call to 'print' sofret 3 1,087 Jun-15-2018, 03:23 PM
Last Post: buran
  3 conditions to check and return ilcaa72 3 437 May-24-2018, 03:13 AM
Last Post: ilcaa72
  Fill as list (with conditions). EmericCrue 6 611 May-11-2018, 03:21 PM
Last Post: nilamo
  Build applet utility missing MaxN 0 310 May-03-2018, 06:59 PM
Last Post: MaxN

Forum Jump:


Users browsing this thread: 1 Guest(s)