Python Forum
Insert values into a column in the same table based on a criteria
Thread Rating:
  • 2 Vote(s) - 2.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Insert values into a column in the same table based on a criteria
#1
I need your help to insert vales into my table based on a criteria.
My dataframe is as follows
df = pd.DataFrame([['2016-04-19',533247,1975000,np.nan], ['2016-04-19',533247,np.nan,1975000],['2016-05-05',np.nan,1947293.42,np.nan],['2016-05-05',np.nan,np.nan,1947293.42],['2016-06-03',535199,1930000,np.nan],['2016-06-04',535199,np.nan,1930000],['2016-08-04',np.nan,195000,np.nan],['2016-08-05',628490,5000,np.nan],['2016-08-05',628490,np.nan,5000],['2016-12-30',750200,15000,np.nan],['2016-12-31',326500,np.nan,15000]], columns=['DATE','CHQNO','Deposit','Withdrawal'])
I need to insert a value as 'RTN' into a new column call 'Status' if there are two records with same value in df[CHQNO] column & same value in df[Deposit] & df[Withdrawal] columns.

I'm expecting a dataframe like this


Output:
DATE        CHQNO Deposit        Withdrawal Status 0 2016-04-19 533247 1975000.0                 RTN 1 2016-04-19 533247                1975000.0 RTN 2 2016-05-05       1947293.42 3 2016-05-05                        1947293.42 4 2016-06-03 535199 1930000.0                 RTN 5 2016-06-04 535199                 1930000.0 RTN 6 2016-08-04         195000.0 7 2016-08-05 628490    5000.0                 RTN 8 2016-08-05 628490                     5000.0 RTN 9 2016-12-30 750200 15000.0 10 2016-12-31 326500                    15000.0
Thank you very much.
Moderator Larz60+: Please use output tags (I added this time)

Attached Files

Thumbnail(s)
   
Reply
#2
You can check for lines that have same CHQNO as following line and deposit is same as following withdrawal (first_one) and for lines that has same CHQNO as previous line and withdrawal is same as previous deposit (second_one).

This works only for deposit "followed" by withdrawal, otherwise you would need to either sort it or use more conditions of same type. Or it could be done by "SQL-joining" dataframe with itself on chqno and withdrawal==deposit followed by some cleaning/processing.
Output:
In [4]: first_one = (df.CHQNO == df.CHQNO.shift(-1)) & (df.Deposit == df.Withdrawal.shift(-1)) In [5]: second_one = (df.CHQNO == df.CHQNO.shift(1)) & (df.Deposit.shift(1) == df.Withdrawal) In [6]: df['rtn'] = np.where(first_one | second_one, "RTN", "") In [7]: df Out[7]:           DATE     CHQNO     Deposit  Withdrawal  rtn 0   2016-04-19  533247.0  1975000.00         NaN  RTN 1   2016-04-19  533247.0         NaN  1975000.00  RTN 2   2016-05-05       NaN  1947293.42         NaN      3   2016-05-05       NaN         NaN  1947293.42      4   2016-06-03  535199.0  1930000.00         NaN  RTN 5   2016-06-04  535199.0         NaN  1930000.00  RTN 6   2016-08-04       NaN   195000.00         NaN      7   2016-08-05  628490.0     5000.00         NaN  RTN 8   2016-08-05  628490.0         NaN     5000.00  RTN 9   2016-12-30  750200.0    15000.00         NaN      10  2016-12-31  326500.0         NaN    15000.00    
Reply
#3
(Apr-13-2017, 08:29 AM)zivoni Wrote: it could be done by "SQL-joining" dataframe with itself on chqno and withdrawal==deposit followed by some cleaning/processing.

Thanks, your code worked pefectly on this sample dataframe. I'm interested to know how this could be done using SQL-joining. Because in the actual full data set i came a cross some CHQNO entries comes after few rows (ie. not ajoined each other). And also there are instances where withdrawal comes first & the deposit comes after as you mentioned correctly.


Actualy i worked this out using T-SQL (MSSQL). However i'm more interested to work out this using python/Pandas.
I'm jst coping my T-SQL code FYI.

CREATE TABLE [TEST].[dbo].[bank] (
[DATE] datetime,
[CHQNO] float,
[Deposit] float,
[Withdrawal] float,
[Status] nvarchar(255)
)

INSERT INTO [Test].[dbo].[bank]
([DATE]
,[CHQNO]
,[Deposit]
,[Withdrawal])
VALUES
('2016-04-19',533247,1975000,NULL)
,('2016-04-19',533247,NULL,1975000)
,('2016-05-05',NULL,1947293.42,NULL)
,('2016-05-05',NULL,1947293.42,NULL)
,('2016-06-03',535199,1930000,NULL)
,('2016-06-04',535199,NULL,1930000)
,('2016-08-04',NULL,195000,NULL)
,('2016-08-05',628490,5000,NULL)
,('2016-08-05',628490,NULL,5000)
,('2016-12-30',750200,15000,NULL)
,('2016-12-31',326500,NULL,15000)

--1st method

SELECT
[DATE] ,
[CHQNO] ,
[Deposit] ,
[Withdrawal]
,CASE WHEN EXISTS (SELECT * FROM [Test].[dbo].bank AS D WHERE d.chqno = b.chqno AND d.withdrawal = b.deposit) THEN 'RTN'
WHEN EXISTS (SELECT * FROM [Test].[dbo].bank AS D WHERE d.chqno = b.chqno AND d.deposit = b.withdrawal) THEN 'RTN'
ELSE NULL
END AS STATUS
FROM
[Test].[dbo].Bank as B


--2nd Method

update a
set a.status='RTN'
from test.dbo.bank as a
where a.chqno is not null
and exists (select 1
from test.dbo.bank as b
where b.chqno=a.chqno
group by b.chqno
having sum(isnull(deposit,0))=sum(isnull(withdrawal,0))
)
Reply
#4
If there are only two lines for each CHQNO and only one of withdrawal/deposit is used on each row, then you can sort your df to be "like" your example. And if there could be more than two lines with same CHQNO or more withdrawal/deposits, then your 2nd method does something else than 1st one.

As I mentioned in previous post, with pandas you can do basically same as with your first sql method - join on chqno/withdrawal/deposit and filter.
Output:
In [4]: df['idx'] = df.index In [5]: aux_df = pd.merge(df, df, left_on=['CHQNO', 'Deposit'], right_on=['CHQNO', 'Withdrawal']) In [6]: df['rtn'] = "" In [7]: df.rtn[aux_df.idx_x[~aux_df.CHQNO.isnull()]] = "RTN" In [8]: df Out[8]:           DATE     CHQNO     Deposit  Withdrawal  idx  rtn 0   2016-04-19  533247.0  1975000.00         NaN    0  RTN 1   2016-04-19  533247.0         NaN  1975000.00    1  RTN 2   2016-05-05       NaN  1947293.42         NaN    2      3   2016-05-05       NaN         NaN  1947293.42    3      4   2016-06-03  535199.0  1930000.00         NaN    4  RTN 5   2016-06-04  535199.0         NaN  1930000.00    5  RTN 6   2016-08-04       NaN   195000.00         NaN    6      7   2016-08-05  628490.0     5000.00         NaN    7  RTN 8   2016-08-05  628490.0         NaN     5000.00    8  RTN 9   2016-12-30  750200.0    15000.00         NaN    9      10  2016-12-31  326500.0         NaN    15000.00   10
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  attempt to split values from within a dataframe column mbrown009 8 2,321 Apr-10-2023, 02:06 AM
Last Post: mbrown009
  Make unique id in vectorized way based on text data column with similarity scoring ill8 0 883 Dec-12-2022, 03:22 AM
Last Post: ill8
  Increase df column values decimals SriRajesh 2 1,106 Nov-14-2022, 05:20 PM
Last Post: deanhystad
Question How does one clean a populated table in MySQL/MariaDB? Copying values across tables? BrandonKastning 2 1,563 Jan-17-2022, 05:46 AM
Last Post: BrandonKastning
  New Dataframe Column Based on Several Conditions nb1214 1 1,799 Nov-16-2021, 10:52 PM
Last Post: jefsummers
  pandas: Compute the % of the unique values in a column JaneTan 1 1,775 Oct-25-2021, 07:55 PM
Last Post: jefsummers
  update values in one dataframe based on another dataframe - Pandas iliasb 2 9,227 Aug-14-2021, 12:38 PM
Last Post: jefsummers
  Pandas Data frame column condition check based on length of the value aditi06 1 2,683 Jul-28-2021, 11:08 AM
Last Post: jefsummers
Question [Solved] How to refer to dataframe column name based on a list lorensa74 1 2,255 May-17-2021, 07:02 AM
Last Post: lorensa74
  Add column based on others timste 8 4,011 Apr-03-2021, 07:39 AM
Last Post: devesh_sahu

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020