Python Forum
Insert values into a column in the same table based on a criteria - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Insert values into a column in the same table based on a criteria (/thread-2828.html)



Insert values into a column in the same table based on a criteria - klllmmm - Apr-13-2017

I need your help to insert vales into my table based on a criteria.
My dataframe is as follows
df = pd.DataFrame([['2016-04-19',533247,1975000,np.nan], ['2016-04-19',533247,np.nan,1975000],['2016-05-05',np.nan,1947293.42,np.nan],['2016-05-05',np.nan,np.nan,1947293.42],['2016-06-03',535199,1930000,np.nan],['2016-06-04',535199,np.nan,1930000],['2016-08-04',np.nan,195000,np.nan],['2016-08-05',628490,5000,np.nan],['2016-08-05',628490,np.nan,5000],['2016-12-30',750200,15000,np.nan],['2016-12-31',326500,np.nan,15000]], columns=['DATE','CHQNO','Deposit','Withdrawal'])
I need to insert a value as 'RTN' into a new column call 'Status' if there are two records with same value in df[CHQNO] column & same value in df[Deposit] & df[Withdrawal] columns.

I'm expecting a dataframe like this


Output:
DATE        CHQNO Deposit        Withdrawal Status 0 2016-04-19 533247 1975000.0                 RTN 1 2016-04-19 533247                1975000.0 RTN 2 2016-05-05       1947293.42 3 2016-05-05                        1947293.42 4 2016-06-03 535199 1930000.0                 RTN 5 2016-06-04 535199                 1930000.0 RTN 6 2016-08-04         195000.0 7 2016-08-05 628490    5000.0                 RTN 8 2016-08-05 628490                     5000.0 RTN 9 2016-12-30 750200 15000.0 10 2016-12-31 326500                    15000.0
Thank you very much.
Moderator Larz60+: Please use output tags (I added this time)


RE: Insert values into a column in the same table based on a criteria - zivoni - Apr-13-2017

You can check for lines that have same CHQNO as following line and deposit is same as following withdrawal (first_one) and for lines that has same CHQNO as previous line and withdrawal is same as previous deposit (second_one).

This works only for deposit "followed" by withdrawal, otherwise you would need to either sort it or use more conditions of same type. Or it could be done by "SQL-joining" dataframe with itself on chqno and withdrawal==deposit followed by some cleaning/processing.
Output:
In [4]: first_one = (df.CHQNO == df.CHQNO.shift(-1)) & (df.Deposit == df.Withdrawal.shift(-1)) In [5]: second_one = (df.CHQNO == df.CHQNO.shift(1)) & (df.Deposit.shift(1) == df.Withdrawal) In [6]: df['rtn'] = np.where(first_one | second_one, "RTN", "") In [7]: df Out[7]:           DATE     CHQNO     Deposit  Withdrawal  rtn 0   2016-04-19  533247.0  1975000.00         NaN  RTN 1   2016-04-19  533247.0         NaN  1975000.00  RTN 2   2016-05-05       NaN  1947293.42         NaN      3   2016-05-05       NaN         NaN  1947293.42      4   2016-06-03  535199.0  1930000.00         NaN  RTN 5   2016-06-04  535199.0         NaN  1930000.00  RTN 6   2016-08-04       NaN   195000.00         NaN      7   2016-08-05  628490.0     5000.00         NaN  RTN 8   2016-08-05  628490.0         NaN     5000.00  RTN 9   2016-12-30  750200.0    15000.00         NaN      10  2016-12-31  326500.0         NaN    15000.00    



RE: Insert values into a column in the same table based on a criteria - klllmmm - Apr-13-2017

(Apr-13-2017, 08:29 AM)zivoni Wrote: it could be done by "SQL-joining" dataframe with itself on chqno and withdrawal==deposit followed by some cleaning/processing.

Thanks, your code worked pefectly on this sample dataframe. I'm interested to know how this could be done using SQL-joining. Because in the actual full data set i came a cross some CHQNO entries comes after few rows (ie. not ajoined each other). And also there are instances where withdrawal comes first & the deposit comes after as you mentioned correctly.


Actualy i worked this out using T-SQL (MSSQL). However i'm more interested to work out this using python/Pandas.
I'm jst coping my T-SQL code FYI.

CREATE TABLE [TEST].[dbo].[bank] (
[DATE] datetime,
[CHQNO] float,
[Deposit] float,
[Withdrawal] float,
[Status] nvarchar(255)
)

INSERT INTO [Test].[dbo].[bank]
([DATE]
,[CHQNO]
,[Deposit]
,[Withdrawal])
VALUES
('2016-04-19',533247,1975000,NULL)
,('2016-04-19',533247,NULL,1975000)
,('2016-05-05',NULL,1947293.42,NULL)
,('2016-05-05',NULL,1947293.42,NULL)
,('2016-06-03',535199,1930000,NULL)
,('2016-06-04',535199,NULL,1930000)
,('2016-08-04',NULL,195000,NULL)
,('2016-08-05',628490,5000,NULL)
,('2016-08-05',628490,NULL,5000)
,('2016-12-30',750200,15000,NULL)
,('2016-12-31',326500,NULL,15000)

--1st method

SELECT
[DATE] ,
[CHQNO] ,
[Deposit] ,
[Withdrawal]
,CASE WHEN EXISTS (SELECT * FROM [Test].[dbo].bank AS D WHERE d.chqno = b.chqno AND d.withdrawal = b.deposit) THEN 'RTN'
WHEN EXISTS (SELECT * FROM [Test].[dbo].bank AS D WHERE d.chqno = b.chqno AND d.deposit = b.withdrawal) THEN 'RTN'
ELSE NULL
END AS STATUS
FROM
[Test].[dbo].Bank as B


--2nd Method

update a
set a.status='RTN'
from test.dbo.bank as a
where a.chqno is not null
and exists (select 1
from test.dbo.bank as b
where b.chqno=a.chqno
group by b.chqno
having sum(isnull(deposit,0))=sum(isnull(withdrawal,0))
)


RE: Insert values into a column in the same table based on a criteria - zivoni - Apr-13-2017

If there are only two lines for each CHQNO and only one of withdrawal/deposit is used on each row, then you can sort your df to be "like" your example. And if there could be more than two lines with same CHQNO or more withdrawal/deposits, then your 2nd method does something else than 1st one.

As I mentioned in previous post, with pandas you can do basically same as with your first sql method - join on chqno/withdrawal/deposit and filter.
Output:
In [4]: df['idx'] = df.index In [5]: aux_df = pd.merge(df, df, left_on=['CHQNO', 'Deposit'], right_on=['CHQNO', 'Withdrawal']) In [6]: df['rtn'] = "" In [7]: df.rtn[aux_df.idx_x[~aux_df.CHQNO.isnull()]] = "RTN" In [8]: df Out[8]:           DATE     CHQNO     Deposit  Withdrawal  idx  rtn 0   2016-04-19  533247.0  1975000.00         NaN    0  RTN 1   2016-04-19  533247.0         NaN  1975000.00    1  RTN 2   2016-05-05       NaN  1947293.42         NaN    2      3   2016-05-05       NaN         NaN  1947293.42    3      4   2016-06-03  535199.0  1930000.00         NaN    4  RTN 5   2016-06-04  535199.0         NaN  1930000.00    5  RTN 6   2016-08-04       NaN   195000.00         NaN    6      7   2016-08-05  628490.0     5000.00         NaN    7  RTN 8   2016-08-05  628490.0         NaN     5000.00    8  RTN 9   2016-12-30  750200.0    15000.00         NaN    9      10  2016-12-31  326500.0         NaN    15000.00   10