Python Forum
create new column based on condition
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
create new column based on condition
#1
Hello
I want to create a new column that includes calculations based on existing columns.

[Image: gmg2jbM]

All rows have same calculations except row 1.
I hope I am able make explain you what I want.

Thanks in advance.
Reply
#2
I cannot see you attachment, so difficult to help you
Reply
#3
How can i share the image here?
Reply
#4
[Image: Capturea.png]
Reply
#5
Suggest using the following format (note, use the Python delimiters to display code)
import pandas as pd
df = pd.DataFrame(data=[[1,2,3],[4,5,6],[7,8,9]])
df[3] = df[0]+df[1]
df
Where I define df[3] you could call a function, do a lambda, or any other way you want the column to be defined.
Reply
#6
I do not see any condition?

If i correctly understand what you're writting (index means row), then the following snippet works.

Of course you can replace "Matrix" by the corresponding " A", "B" and "C"

import numpy as np
import time

Nmax = 10_000
r, c = 1_000_000, 3

Matrix = np.arange(r*c).reshape(r, c)

# for all the matrix except the first row
t0=time.time()
D = np.zeros((r, 1))
i = np.arange(1, r)
D[i, 0] = Matrix[i-1, 2] - Matrix[i, 0] + Matrix[i, 1]

# only for the first row
i = 0
D[i, 0] = Matrix[i, 2] + Matrix[i, 0] - Matrix[i, 1]
D = D.astype(int)
t1 = time.time()
print(f"Duration = {t1-t0}")

print(f"D = {D}")
Output:
D = [[ 1] [ 3] [ 6] ... [2999991] [2999994] [2999997]]
Reply
#7
Could you kindly explain the code?
Reply
#8
I'm using here a"vectorized" form instead of using loops; vectorization is much faster as you'll notice.

Remarks:
  • The Matrix array is just an example, you can useyour own A,B,C instead
  • D is intialized to avoid (memory) dynamic allocation; it's a good practise
  • i is the "index" vector (from 1 to (r-1))
  • D is obvious here - based on your formula
  • finally the first row corresponds to index i=0
  • if your working only with integer, it might have been more relevant in my previous post to directly define the dtype for D/D2
  • since the first row uses a different formula, so it can be calculated before of after the main block

Test the following codes to compare the vectorization and (classical) loops => you'll figured out how it works.

Hope it helps

import numpy as np
import time

Nmax = 10_000
r, c = 1_000_000, 3

# Matrix = np.random.randint(1, Nmax, size=(r,c))
Matrix = np.arange(r*c).reshape(r, c)

# for all the matrix except the first row
t0=time.time()
D = np.zeros((r, 1), dtype=int)
i = np.arange(1, r)
D[i, 0] = Matrix[i-1, 2] - Matrix[i, 0] + Matrix[i, 1]

# specifically for the first row
i = 0
D[i, 0] = Matrix[i, 2] + Matrix[i, 0] - Matrix[i, 1]
t1 = time.time()
print(f"Duration1 = {t1-t0}")


# using loops
t2=time.time()
D2 = np.zeros((r, 1), dtype=int)
for i in range(1, r):
    D2[i, 0] = Matrix[i-1, 2] - Matrix[i, 0] + Matrix[i, 1]
D2[0, 0] = Matrix[0, 2] + Matrix[0, 0] - Matrix[0, 1]
t3=time.time()

print(f"D equals D2? => {np.array_equal(D, D2)}")
print(f"Duration2 = {t3-t2}")
print(f"Duration's ratio = {(t3-t2)/(t1-t0)}")
Reply
#9
Thanks mate!
Oh yes, I am aware of the term "vectorization." Could you kindly demonstrate how to use loops to accomplish the same thing? Then, I think, it will be easy to compare and understand both approaches.
Reply
#10
Thanks mate!
Oh yes, I am aware of the term "vectorization." Could you kindly demonstrate how to use loops to accomplish the same thing? Then, I think, it will be easy to compare and understand both approaches.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Get an average of the unique values of a column with group by condition and assign it klllmmm 0 290 Feb-17-2024, 05:53 PM
Last Post: klllmmm
  unable to remove all elements from list based on a condition sg_python 3 468 Jan-27-2024, 04:03 PM
Last Post: deanhystad
  Create dual folder on different path/drive based on the date agmoraojr 2 459 Jan-21-2024, 10:02 AM
Last Post: snippsat
  Python Alteryx QS-Passing pandas dataframe column inside SQL query where condition sanky1990 0 752 Dec-04-2023, 09:48 PM
Last Post: sanky1990
  Sent email based on if condition stewietopg 1 871 Mar-15-2023, 08:54 AM
Last Post: menator01
  How to assign a value to pandas dataframe column rows based on a condition klllmmm 0 853 Sep-08-2022, 06:32 AM
Last Post: klllmmm
  Python create a spreadsheet with column and row header ouruslife 4 1,651 Jul-09-2022, 11:01 AM
Last Post: Pedroski55
  select Eof extension files based on text list of filenames with if condition RolanRoll 1 1,532 Apr-04-2022, 09:29 PM
Last Post: Larz60+
  Openpyxl-change value of cells in column based on value that currently occupies cells phillipaj1391 5 9,878 Mar-30-2022, 11:05 PM
Last Post: Pedroski55
  Cannot convert the series to <class 'int'> when trying to create new dataframe column Mark17 3 8,555 Jan-20-2022, 05:15 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020