Posts: 22
Threads: 5
Joined: Dec 2022
Dec-12-2022, 11:32 AM
(This post was last modified: Dec-12-2022, 11:39 AM by arvin.)
Hello
I want to create a new column that includes calculations based on existing columns.
All rows have same calculations except row 1.
I hope I am able make explain you what I want.
Thanks in advance.
Posts: 299
Threads: 72
Joined: Apr 2019
I cannot see you attachment, so difficult to help you
Posts: 22
Threads: 5
Joined: Dec 2022
How can i share the image here?
Posts: 22
Threads: 5
Joined: Dec 2022
Posts: 1,358
Threads: 2
Joined: May 2019
Suggest using the following format (note, use the Python delimiters to display code)
import pandas as pd
df = pd.DataFrame(data=[[1,2,3],[4,5,6],[7,8,9]])
df[3] = df[0]+df[1]
df Where I define df[3] you could call a function, do a lambda, or any other way you want the column to be defined.
Posts: 299
Threads: 72
Joined: Apr 2019
Dec-12-2022, 05:13 PM
(This post was last modified: Dec-12-2022, 05:13 PM by paul18fr.)
I do not see any condition?
If i correctly understand what you're writting (index means row), then the following snippet works.
Of course you can replace "Matrix" by the corresponding " A", "B" and "C"
import numpy as np
import time
Nmax = 10_000
r, c = 1_000_000, 3
Matrix = np.arange(r*c).reshape(r, c)
# for all the matrix except the first row
t0=time.time()
D = np.zeros((r, 1))
i = np.arange(1, r)
D[i, 0] = Matrix[i-1, 2] - Matrix[i, 0] + Matrix[i, 1]
# only for the first row
i = 0
D[i, 0] = Matrix[i, 2] + Matrix[i, 0] - Matrix[i, 1]
D = D.astype(int)
t1 = time.time()
print(f"Duration = {t1-t0}")
print(f"D = {D}") Output: D = [[ 1]
[ 3]
[ 6]
...
[2999991]
[2999994]
[2999997]]
Posts: 22
Threads: 5
Joined: Dec 2022
Could you kindly explain the code?
Posts: 299
Threads: 72
Joined: Apr 2019
I'm using here a"vectorized" form instead of using loops; vectorization is much faster as you'll notice.
Remarks: - The Matrix array is just an example, you can useyour own A,B,C instead
- D is intialized to avoid (memory) dynamic allocation; it's a good practise
- i is the "index" vector (from 1 to (r-1))
- D is obvious here - based on your formula
- finally the first row corresponds to index i=0
- if your working only with integer, it might have been more relevant in my previous post to directly define the dtype for D/D2
- since the first row uses a different formula, so it can be calculated before of after the main block
Test the following codes to compare the vectorization and (classical) loops => you'll figured out how it works.
Hope it helps
import numpy as np
import time
Nmax = 10_000
r, c = 1_000_000, 3
# Matrix = np.random.randint(1, Nmax, size=(r,c))
Matrix = np.arange(r*c).reshape(r, c)
# for all the matrix except the first row
t0=time.time()
D = np.zeros((r, 1), dtype=int)
i = np.arange(1, r)
D[i, 0] = Matrix[i-1, 2] - Matrix[i, 0] + Matrix[i, 1]
# specifically for the first row
i = 0
D[i, 0] = Matrix[i, 2] + Matrix[i, 0] - Matrix[i, 1]
t1 = time.time()
print(f"Duration1 = {t1-t0}")
# using loops
t2=time.time()
D2 = np.zeros((r, 1), dtype=int)
for i in range(1, r):
D2[i, 0] = Matrix[i-1, 2] - Matrix[i, 0] + Matrix[i, 1]
D2[0, 0] = Matrix[0, 2] + Matrix[0, 0] - Matrix[0, 1]
t3=time.time()
print(f"D equals D2? => {np.array_equal(D, D2)}")
print(f"Duration2 = {t3-t2}")
print(f"Duration's ratio = {(t3-t2)/(t1-t0)}")
Posts: 22
Threads: 5
Joined: Dec 2022
Dec-13-2022, 10:36 AM
(This post was last modified: Dec-13-2022, 10:36 AM by arvin.)
Thanks mate!
Oh yes, I am aware of the term "vectorization." Could you kindly demonstrate how to use loops to accomplish the same thing? Then, I think, it will be easy to compare and understand both approaches.
Posts: 22
Threads: 5
Joined: Dec 2022
Thanks mate!
Oh yes, I am aware of the term "vectorization." Could you kindly demonstrate how to use loops to accomplish the same thing? Then, I think, it will be easy to compare and understand both approaches.
|