Hi,
I have df with 10K rows, and if I use iterrows its become slower. Then I use itertuples & getattr. How ever I also need to access previous row. I use below code but it fail to access. can any one help how to access previous row using index.
import pandas as pd
d = {'col1': ['A', 'B', 'C', 'D'], 'col2': [1, 2, 3, 4]}
df = pd.DataFrame(data=d)
for idx,row in enumerate(df.itertuples(),1):
print("Current index:",row)
print("current col2 value:", getattr(row, 'col2'))
print("Previous col2 value:", getattr(df[idx-1],'col2'))
erro is:
raise KeyError(key) from err
KeyError: 0
If you start at the first row there is no previous row.
You can hang on to the previous row and print the previous row after you get the second row.
import pandas as pd
d = {'col1': ['A', 'B', 'C', 'D'], 'col2': [1, 2, 3, 4]}
df = pd.DataFrame(data=d)
prev = None
for row in df.itertuples():
print("Current index:",row)
print("current col2 value:", getattr(row, 'col2'))
if prev is not None:
print("Previous col2 value:", getattr(prev,'col2'))
prev = row
Or you can start printing the second row.
import pandas as pd
d = {'col1': ['A', 'B', 'C', 'D'], 'col2': [1, 2, 3, 4]}
df = pd.DataFrame(data=d)
rows = df.itertuples()
prev = next(rows) # Gets first row
for row in rows: # Will start at second row
print("Current index:",row)
print("current col2 value:", getattr(row, 'col2'))
print("Previous col2 value:", getattr(prev,'col2'))
prev = row
Just out of curiosity- why there is need to iterate over rows while you need values from one column? One can access column (serie) directly, without need to iterate rows.
I think that was for demo purposes. If not, grabbing a column series would be MUCH faster
(Feb-04-2022, 07:28 PM)deanhystad Wrote: [ -> ]I think that was for demo purposes. If not, grabbing a column series would be MUCH faster
Yes, I need every column, but just for simplicity I shown one column. Is there any other way to access previous with good performance other than itertuples.
(Feb-04-2022, 03:57 PM)deanhystad Wrote: [ -> ]If you start at the first row there is no previous row.
You can hang on to the previous row and print the previous row after you get the second row.
import pandas as pd
d = {'col1': ['A', 'B', 'C', 'D'], 'col2': [1, 2, 3, 4]}
df = pd.DataFrame(data=d)
prev = None
for row in df.itertuples():
print("Current index:",row)
print("current col2 value:", getattr(row, 'col2'))
if prev is not None:
print("Previous col2 value:", getattr(prev,'col2'))
prev = row
Or you can start printing the second row.
import pandas as pd
d = {'col1': ['A', 'B', 'C', 'D'], 'col2': [1, 2, 3, 4]}
df = pd.DataFrame(data=d)
rows = df.itertuples()
prev = next(rows) # Gets first row
for row in rows: # Will start at second row
print("Current index:",row)
print("current col2 value:", getattr(row, 'col2'))
print("Previous col2 value:", getattr(prev,'col2'))
prev = row
(Feb-04-2022, 03:57 PM)deanhystad Wrote: [ -> ]If you start at the first row there is no previous row.
You can hang on to the previous row and print the previous row after you get the second row.
import pandas as pd
d = {'col1': ['A', 'B', 'C', 'D'], 'col2': [1, 2, 3, 4]}
df = pd.DataFrame(data=d)
prev = None
for row in df.itertuples():
print("Current index:",row)
print("current col2 value:", getattr(row, 'col2'))
if prev is not None:
print("Previous col2 value:", getattr(prev,'col2'))
prev = row
Or you can start printing the second row.
import pandas as pd
d = {'col1': ['A', 'B', 'C', 'D'], 'col2': [1, 2, 3, 4]}
df = pd.DataFrame(data=d)
rows = df.itertuples()
prev = next(rows) # Gets first row
for row in rows: # Will start at second row
print("Current index:",row)
print("current col2 value:", getattr(row, 'col2'))
print("Previous col2 value:", getattr(prev,'col2'))
prev = row
I need to iterate over rows and need to access previous & current rows content on each iteration. Not just first row.
both of my examples handle every iteration, not just the first. Run the examples and you will see. The examples differ in how they handle the first iteration.
You can't iterate over all the rows and have the previous row for each iteration. There is no "previous row" for the first iteration. In my first example I handle there not being a previous row by not printing the previous row. In the second example I skip the first row and iterate over the remaining rows. The first row becomes the first "previous row".
One way is to access rows by indices. Of course, there is still question about first row - whether it should be ignored or what?
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(35).reshape(7, 5), columns=[*'abcde'])
for i in range(1, df.shape[0]):
print("Current row", *df.iloc[i])
print("Previous row", *df.iloc[i-1])
Output:
Current row 5 6 7 8 9
Previous row 0 1 2 3 4
Current row 10 11 12 13 14
Previous row 5 6 7 8 9
Current row 15 16 17 18 19
Previous row 10 11 12 13 14
Current row 20 21 22 23 24
Previous row 15 16 17 18 19
Current row 25 26 27 28 29
Previous row 20 21 22 23 24
Current row 30 31 32 33 34
Previous row 25 26 27 28 29
I am not motivated enough to find out whether its faster than itertuples and getattr. I also believe that print is not the objective as I can't see any value of printing out 20K rows.