Insert values into a column in the same table based on a criteria - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Insert values into a column in the same table based on a criteria (/thread-2828.html) |
Insert values into a column in the same table based on a criteria - klllmmm - Apr-13-2017 I need your help to insert vales into my table based on a criteria. My dataframe is as follows df = pd.DataFrame([['2016-04-19',533247,1975000,np.nan], ['2016-04-19',533247,np.nan,1975000],['2016-05-05',np.nan,1947293.42,np.nan],['2016-05-05',np.nan,np.nan,1947293.42],['2016-06-03',535199,1930000,np.nan],['2016-06-04',535199,np.nan,1930000],['2016-08-04',np.nan,195000,np.nan],['2016-08-05',628490,5000,np.nan],['2016-08-05',628490,np.nan,5000],['2016-12-30',750200,15000,np.nan],['2016-12-31',326500,np.nan,15000]], columns=['DATE','CHQNO','Deposit','Withdrawal'])I need to insert a value as 'RTN' into a new column call 'Status' if there are two records with same value in df[CHQNO] column & same value in df[Deposit] & df[Withdrawal] columns. I'm expecting a dataframe like this Thank you very much.Moderator Larz60+: Please use output tags (I added this time)
RE: Insert values into a column in the same table based on a criteria - zivoni - Apr-13-2017 You can check for lines that have same CHQNO as following line and deposit is same as following withdrawal (first_one) and for lines that has same CHQNO as previous line and withdrawal is same as previous deposit (second_one). This works only for deposit "followed" by withdrawal, otherwise you would need to either sort it or use more conditions of same type. Or it could be done by "SQL-joining" dataframe with itself on chqno and withdrawal==deposit followed by some cleaning/processing.
RE: Insert values into a column in the same table based on a criteria - klllmmm - Apr-13-2017 (Apr-13-2017, 08:29 AM)zivoni Wrote: it could be done by "SQL-joining" dataframe with itself on chqno and withdrawal==deposit followed by some cleaning/processing. Thanks, your code worked pefectly on this sample dataframe. I'm interested to know how this could be done using SQL-joining. Because in the actual full data set i came a cross some CHQNO entries comes after few rows (ie. not ajoined each other). And also there are instances where withdrawal comes first & the deposit comes after as you mentioned correctly. Actualy i worked this out using T-SQL (MSSQL). However i'm more interested to work out this using python/Pandas. I'm jst coping my T-SQL code FYI. CREATE TABLE [TEST].[dbo].[bank] ( [DATE] datetime, [CHQNO] float, [Deposit] float, [Withdrawal] float, [Status] nvarchar(255) ) INSERT INTO [Test].[dbo].[bank] ([DATE] ,[CHQNO] ,[Deposit] ,[Withdrawal]) VALUES ('2016-04-19',533247,1975000,NULL) ,('2016-04-19',533247,NULL,1975000) ,('2016-05-05',NULL,1947293.42,NULL) ,('2016-05-05',NULL,1947293.42,NULL) ,('2016-06-03',535199,1930000,NULL) ,('2016-06-04',535199,NULL,1930000) ,('2016-08-04',NULL,195000,NULL) ,('2016-08-05',628490,5000,NULL) ,('2016-08-05',628490,NULL,5000) ,('2016-12-30',750200,15000,NULL) ,('2016-12-31',326500,NULL,15000) --1st method SELECT [DATE] , [CHQNO] , [Deposit] , [Withdrawal] ,CASE WHEN EXISTS (SELECT * FROM [Test].[dbo].bank AS D WHERE d.chqno = b.chqno AND d.withdrawal = b.deposit) THEN 'RTN' WHEN EXISTS (SELECT * FROM [Test].[dbo].bank AS D WHERE d.chqno = b.chqno AND d.deposit = b.withdrawal) THEN 'RTN' ELSE NULL END AS STATUS FROM [Test].[dbo].Bank as B --2nd Method update a set a.status='RTN' from test.dbo.bank as a where a.chqno is not null and exists (select 1 from test.dbo.bank as b where b.chqno=a.chqno group by b.chqno having sum(isnull(deposit,0))=sum(isnull(withdrawal,0)) ) RE: Insert values into a column in the same table based on a criteria - zivoni - Apr-13-2017 If there are only two lines for each CHQNO and only one of withdrawal/deposit is used on each row, then you can sort your df to be "like" your example. And if there could be more than two lines with same CHQNO or more withdrawal/deposits, then your 2nd method does something else than 1st one. As I mentioned in previous post, with pandas you can do basically same as with your first sql method - join on chqno/withdrawal/deposit and filter.
|