Bottom Page

Thread Rating:
  • 3 Vote(s) - 2.67 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to define Missing value on some certain conditions?
#1
Hello everyone.
I am working on a project that consist in finding missing data in a table generated from SQL server.
I am showing a simplified model which has only two customer, mine has hundreds.

Please check the image:
İmage


I need to build a function that does this(step by step):
1. Check IF either From date or To date is blank(for simplicity let’s say To date)
2. IF there is a blank cell than to check IF ‘Customer’ and ‘Account’ in this row are the same with the ones above and below (columns are sorted in ascending order). IF this is not true than to write “Missing” otherwise to make some deeper analysis.
3. IF the ‘Customer’ and ‘Account’ are the same than to check IF there is any continuity between dates. For example the first one, index[1]:
IF To date[0]=From Date[3] than write Not Missing otherwise Missing

This is my first project it Python and I will really appreciate if you would help me.

PS: Can this case be handled by using sklearn.preprocessing library?
Quote
#2
I would suggest you to use Pandas package.
It is designed to handle the cases you just described.

sklearn.preprocessing is primarily about applying some transformations to data (numerical or categorical data), e.g. scaling, encoding etc. In your case you need to handle missing data, select data by conditions and change them. Pandas is designed exactly for that.
Quote
#3
Thank you for your suggestion scidam!
I need some guidance if it possible. Can I handle this case by using while loop or some IF statements?
Quote
#4
I am not sure that I did all right, but the following example I just wrote should handle
your task:
import pandas as pd

data = pd.DataFrame({'A': ['A'] * 10 + ['B'] * 7, # Customer
                    'B': ['C'] * 7 + ['D'] * 10, # Account
                    'From': [1, None, 2, None, 5, None, None, 7, None, 8, 10, 12, 14, None, 15, None, None], #From date
                    'To': [2, None, 3, None, 6, None, None, 8, None, 9, 11, 13, 15, None, 16, None, None],  # To date
                    'N': [None] * 17}) # Missing or Not missing, all values aren't defined by default


def fill_data(df):
    result = df.copy()
    both_empty = df.From.isnull() & df.To.isnull()
    result.loc[((df.From == df.To.shift(2)) & (~df.From.isnull()) & (~df.To.shift(2).isnull())).shift(-1) & both_empty, 'N'] = 'NM'
    result.loc[((df.From != df.To.shift(2)) & (~df.From.isnull()) & (~df.To.shift(2).isnull())).shift(-1) & both_empty, 'N'] = 'M'
    return result


result = data.groupby(['A', 'B']).apply(fill_data)
print(result.reset_index(drop=True))
Note: You will need to convert 'From' and 'To' to datetime objects (see pd.to_datetime function).
In this example I am using numbers to fill these columns for simplicity.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Missing required dependencies when using pyinstaller Ghonim 14 469 Mar-08-2019, 09:54 AM
Last Post: dmag
  How to manually define color bar scale in seaborn heatmap SriRajesh 2 83 Mar-06-2019, 01:09 PM
Last Post: SriRajesh
  missing 1 required positional argument error nikos 3 175 Feb-28-2019, 12:53 PM
Last Post: nikos
  Missing 2 Required Positional Arguments: SwiftWater 1 83 Feb-28-2019, 08:57 AM
Last Post: buran
  dynamically define class's michavardy 1 83 Feb-26-2019, 04:20 PM
Last Post: buran
  Most efficient way to define sub keys of a dictionary? wrybread 1 121 Feb-21-2019, 12:23 AM
Last Post: snippsat
  breaking even if conditions is not met Naito 5 301 Feb-03-2019, 12:30 PM
Last Post: perfringo
  Missing something here Robpl1 2 167 Jan-14-2019, 10:35 AM
Last Post: Gribouillis
  help for Kaggle Titanic Set fill the missing Age by median age of Pclass and Sex Parthasarathi009 2 291 Nov-21-2018, 06:50 PM
Last Post: Parthasarathi009
  If conditions with time limit unknowntothem 4 293 Nov-09-2018, 08:59 PM
Last Post: nilamo

Forum Jump:


Users browsing this thread: 1 Guest(s)