Bottom Page

Thread Rating:
  • 3 Vote(s) - 2.67 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 How to define Missing value on some certain conditions?
#1
Hello everyone.
I am working on a project that consist in finding missing data in a table generated from SQL server.
I am showing a simplified model which has only two customer, mine has hundreds.

Please check the image:
İmage


I need to build a function that does this(step by step):
1. Check IF either From date or To date is blank(for simplicity let’s say To date)
2. IF there is a blank cell than to check IF ‘Customer’ and ‘Account’ in this row are the same with the ones above and below (columns are sorted in ascending order). IF this is not true than to write “Missing” otherwise to make some deeper analysis.
3. IF the ‘Customer’ and ‘Account’ are the same than to check IF there is any continuity between dates. For example the first one, index[1]:
IF To date[0]=From Date[3] than write Not Missing otherwise Missing

This is my first project it Python and I will really appreciate if you would help me.

PS: Can this case be handled by using sklearn.preprocessing library?
Quote
#2
I would suggest you to use Pandas package.
It is designed to handle the cases you just described.

sklearn.preprocessing is primarily about applying some transformations to data (numerical or categorical data), e.g. scaling, encoding etc. In your case you need to handle missing data, select data by conditions and change them. Pandas is designed exactly for that.
Quote
#3
Thank you for your suggestion scidam!
I need some guidance if it possible. Can I handle this case by using while loop or some IF statements?
Quote
#4
I am not sure that I did all right, but the following example I just wrote should handle
your task:
import pandas as pd

data = pd.DataFrame({'A': ['A'] * 10 + ['B'] * 7, # Customer
                    'B': ['C'] * 7 + ['D'] * 10, # Account
                    'From': [1, None, 2, None, 5, None, None, 7, None, 8, 10, 12, 14, None, 15, None, None], #From date
                    'To': [2, None, 3, None, 6, None, None, 8, None, 9, 11, 13, 15, None, 16, None, None],  # To date
                    'N': [None] * 17}) # Missing or Not missing, all values aren't defined by default


def fill_data(df):
    result = df.copy()
    both_empty = df.From.isnull() & df.To.isnull()
    result.loc[((df.From == df.To.shift(2)) & (~df.From.isnull()) & (~df.To.shift(2).isnull())).shift(-1) & both_empty, 'N'] = 'NM'
    result.loc[((df.From != df.To.shift(2)) & (~df.From.isnull()) & (~df.To.shift(2).isnull())).shift(-1) & both_empty, 'N'] = 'M'
    return result


result = data.groupby(['A', 'B']).apply(fill_data)
print(result.reset_index(drop=True))
Note: You will need to convert 'From' and 'To' to datetime objects (see pd.to_datetime function).
In this example I am using numbers to fill these columns for simplicity.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  spwd module missing in iOS dr_atomic 3 172 Jun-14-2019, 10:32 AM
Last Post: noisefloor
  How define iteration interval increment SriMekala 5 284 Jun-01-2019, 01:06 PM
Last Post: ichabod801
  How to define a function that calculates the BMI from dataframe DavidGG 2 203 May-30-2019, 03:35 PM
Last Post: volcano63
  Find index of missing number parthi1705 3 192 May-07-2019, 10:52 AM
Last Post: avorane
  How to I define a variable between strings in telnetlib write? Fez 2 117 May-02-2019, 06:53 PM
Last Post: Fez
  missing 1 required positional argument: psosmol 7 711 Apr-16-2019, 10:07 AM
Last Post: DeaD_EyE
  Missing required dependencies when using pyinstaller Ghonim 14 1,267 Mar-08-2019, 09:54 AM
Last Post: dmag
  How to manually define color bar scale in seaborn heatmap SriRajesh 2 304 Mar-06-2019, 01:09 PM
Last Post: SriRajesh
  missing 1 required positional argument error nikos 3 907 Feb-28-2019, 12:53 PM
Last Post: nikos
  Missing 2 Required Positional Arguments: SwiftWater 1 618 Feb-28-2019, 08:57 AM
Last Post: buran

Forum Jump:


Users browsing this thread: 1 Guest(s)