Insert values into a column in the same table based on a criteria

klllmmm · (This post was last modified: Apr-13-2017, 06:04 AM by Larz60+.)

I need your help to insert vales into my table based on a criteria.
My dataframe is as follows

df = pd.DataFrame([['2016-04-19',533247,1975000,np.nan], ['2016-04-19',533247,np.nan,1975000],['2016-05-05',np.nan,1947293.42,np.nan],['2016-05-05',np.nan,np.nan,1947293.42],['2016-06-03',535199,1930000,np.nan],['2016-06-04',535199,np.nan,1930000],['2016-08-04',np.nan,195000,np.nan],['2016-08-05',628490,5000,np.nan],['2016-08-05',628490,np.nan,5000],['2016-12-30',750200,15000,np.nan],['2016-12-31',326500,np.nan,15000]], columns=['DATE','CHQNO','Deposit','Withdrawal'])

I need to insert a value as 'RTN' into a new column call 'Status' if there are two records with same value in df[CHQNO] column & same value in df[Deposit] & df[Withdrawal] columns.

I'm expecting a dataframe like this

Output:DATE        CHQNO Deposit        Withdrawal Status
0 2016-04-19 533247 1975000.0                 RTN
1 2016-04-19 533247                1975000.0 RTN
2 2016-05-05       1947293.42 
3 2016-05-05                        1947293.42 
4 2016-06-03 535199 1930000.0                 RTN
5 2016-06-04 535199                 1930000.0 RTN
6 2016-08-04         195000.0 
7 2016-08-05 628490    5000.0                 RTN
8 2016-08-05 628490                     5000.0 RTN
9 2016-12-30 750200 15000.0 
10 2016-12-31 326500                    15000.0

Thank you very much.
Moderator Larz60+: Please use output tags (I added this time)

***zivoni*** · Apr-13-2017, 08:29 AM

You can check for lines that have same CHQNO as following line and deposit is same as following withdrawal (first_one) and for lines that has same CHQNO as previous line and withdrawal is same as previous deposit (second_one).

This works only for deposit "followed" by withdrawal, otherwise you would need to either sort it or use more conditions of same type. Or it could be done by "SQL-joining" dataframe with itself on chqno and withdrawal==deposit followed by some cleaning/processing.

Output:In [4]: first_one = (df.CHQNO == df.CHQNO.shift(-1)) & (df.Deposit == df.Withdrawal.shift(-1))

In [5]: second_one = (df.CHQNO == df.CHQNO.shift(1)) & (df.Deposit.shift(1) == df.Withdrawal)

In [6]: df['rtn'] = np.where(first_one | second_one, "RTN", "")

In [7]: df
Out[7]: 
          DATE     CHQNO     Deposit  Withdrawal  rtn
0   2016-04-19  533247.0  1975000.00         NaN  RTN
1   2016-04-19  533247.0         NaN  1975000.00  RTN
2   2016-05-05       NaN  1947293.42         NaN     
3   2016-05-05       NaN         NaN  1947293.42     
4   2016-06-03  535199.0  1930000.00         NaN  RTN
5   2016-06-04  535199.0         NaN  1930000.00  RTN
6   2016-08-04       NaN   195000.00         NaN     
7   2016-08-05  628490.0     5000.00         NaN  RTN
8   2016-08-05  628490.0         NaN     5000.00  RTN
9   2016-12-30  750200.0    15000.00         NaN     
10  2016-12-31  326500.0         NaN    15000.00

klllmmm · (This post was last modified: Apr-13-2017, 09:26 AM by klllmmm.)

(Apr-13-2017, 08:29 AM)zivoni Wrote: it could be done by "SQL-joining" dataframe with itself on chqno and withdrawal==deposit followed by some cleaning/processing.

Thanks, your code worked pefectly on this sample dataframe. I'm interested to know how this could be done using SQL-joining. Because in the actual full data set i came a cross some CHQNO entries comes after few rows (ie. not ajoined each other). And also there are instances where withdrawal comes first & the deposit comes after as you mentioned correctly.

Actualy i worked this out using T-SQL (MSSQL). However i'm more interested to work out this using python/Pandas.
I'm jst coping my T-SQL code FYI.

CREATE TABLE [TEST].[dbo].[bank] (
[DATE] datetime,
[CHQNO] float,
[Deposit] float,
[Withdrawal] float,
[Status] nvarchar(255)
)

INSERT INTO [Test].[dbo].[bank]
([DATE]
,[CHQNO]
,[Deposit]
,[Withdrawal])
VALUES
('2016-04-19',533247,1975000,NULL)
,('2016-04-19',533247,NULL,1975000)
,('2016-05-05',NULL,1947293.42,NULL)
,('2016-05-05',NULL,1947293.42,NULL)
,('2016-06-03',535199,1930000,NULL)
,('2016-06-04',535199,NULL,1930000)
,('2016-08-04',NULL,195000,NULL)
,('2016-08-05',628490,5000,NULL)
,('2016-08-05',628490,NULL,5000)
,('2016-12-30',750200,15000,NULL)
,('2016-12-31',326500,NULL,15000)

--1st method

SELECT
[DATE] ,
[CHQNO] ,
[Deposit] ,
[Withdrawal]
,CASE WHEN EXISTS (SELECT * FROM [Test].[dbo].bank AS D WHERE d.chqno = b.chqno AND d.withdrawal = b.deposit) THEN 'RTN'
WHEN EXISTS (SELECT * FROM [Test].[dbo].bank AS D WHERE d.chqno = b.chqno AND d.deposit = b.withdrawal) THEN 'RTN'
ELSE NULL
END AS STATUS
FROM
[Test].[dbo].Bank as B

--2nd Method

update a
set a.status='RTN'
from test.dbo.bank as a
where a.chqno is not null
and exists (select 1
from test.dbo.bank as b
where b.chqno=a.chqno
group by b.chqno
having sum(isnull(deposit,0))=sum(isnull(withdrawal,0))
)

***zivoni*** · Apr-13-2017, 10:10 AM

If there are only two lines for each CHQNO and only one of withdrawal/deposit is used on each row, then you can sort your df to be "like" your example. And if there could be more than two lines with same CHQNO or more withdrawal/deposits, then your 2nd method does something else than 1st one.

As I mentioned in previous post, with pandas you can do basically same as with your first sql method - join on chqno/withdrawal/deposit and filter.

Output:In [4]: df['idx'] = df.index

In [5]: aux_df = pd.merge(df, df, left_on=['CHQNO', 'Deposit'], right_on=['CHQNO', 'Withdrawal'])

In [6]: df['rtn'] = ""

In [7]: df.rtn[aux_df.idx_x[~aux_df.CHQNO.isnull()]] = "RTN"

In [8]: df
Out[8]: 
          DATE     CHQNO     Deposit  Withdrawal  idx  rtn
0   2016-04-19  533247.0  1975000.00         NaN    0  RTN
1   2016-04-19  533247.0         NaN  1975000.00    1  RTN
2   2016-05-05       NaN  1947293.42         NaN    2     
3   2016-05-05       NaN         NaN  1947293.42    3     
4   2016-06-03  535199.0  1930000.00         NaN    4  RTN
5   2016-06-04  535199.0         NaN  1930000.00    5  RTN
6   2016-08-04       NaN   195000.00         NaN    6     
7   2016-08-05  628490.0     5000.00         NaN    7  RTN
8   2016-08-05  628490.0         NaN     5000.00    8  RTN
9   2016-12-30  750200.0    15000.00         NaN    9     
10  2016-12-31  326500.0         NaN    15000.00   10

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	attempt to split values from within a dataframe column	mbrown009	9	6,372	Jun-20-2024, 07:59 PM Last Post: AdamHensley
	Make unique id in vectorized way based on text data column with similarity scoring	ill8	0	1,514	Dec-12-2022, 03:22 AM Last Post: ill8
	Increase df column values decimals	SriRajesh	2	2,015	Nov-14-2022, 05:20 PM Last Post: deanhystad
	How does one clean a populated table in MySQL/MariaDB? Copying values across tables?	BrandonKastning	2	2,327	Jan-17-2022, 05:46 AM Last Post: BrandonKastning
	New Dataframe Column Based on Several Conditions	nb1214	1	2,631	Nov-16-2021, 10:52 PM Last Post: jefsummers
	pandas: Compute the % of the unique values in a column	JaneTan	1	2,520	Oct-25-2021, 07:55 PM Last Post: jefsummers
	update values in one dataframe based on another dataframe - Pandas	iliasb	2	13,889	Aug-14-2021, 12:38 PM Last Post: jefsummers
	Pandas Data frame column condition check based on length of the value	aditi06	1	3,921	Jul-28-2021, 11:08 AM Last Post: jefsummers
	[Solved] How to refer to dataframe column name based on a list	lorensa74	1	3,109	May-17-2021, 07:02 AM Last Post: lorensa74
	Add column based on others	timste	8	5,856	Apr-03-2021, 07:39 AM Last Post: devesh_sahu

Insert values into a column in the same table based on a criteria

User Panel Messages

Announcements