Python Forum

Full Version: pandas dataframe next rows value
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Dear Python Experts,
I am looking and Daniel Breen´s project* about housing prices and GDP in the US.
I noticed about half way through In [37] he is doing the following:

economy_df['Next Quarter GDP'] = list(economy_df['GDP (billions)'].iloc[1:]) + [np.NAN]
economy_df['Two Quarters GDP'] = list(economy_df['GDP (billions)'].iloc[2:]) + 2*[np.NAN]

It seems that the first line goes down one row and takes the value and puts it in a column while
the second line takes the value from two rows down and puts it in a column.
Code wise I dont really understand it. Why the list? What does the np.NAN do?
When I take the list(...) away the whole thing does not go down 1 or 2 rows anymore.
Is there another way to achieve the same functionality?

Many thanks for any ideas and a great weekend.

*http://danielbreen.net/projects/housing_prices_college_towns/
He wants to shift/lag GDP to have current value and value from next record in same row.

So he takes df['GDP'] and with iloc removes the first value. He cant assign it directly as a new column (well, he can, but that won't work, df['GDP'] is series based on the same index as df and direct assignment would assign values on original rows, except NaN for first row).

Thats why he "removes" the index by converting to list and fills it with np.NaN to same length as the original df. After that he can assign it as a new column. When you remove list(), adding pd.Series and [np.NaN] results in pd.Series where np.NaN is added to each value in pd.Series.

And yes, this is unnecessary complicated. As shifting/lagging is very common, pandas provides function shift() that can do it directly.

Example dataframe:

His way:

Simpler way:
Incredible! You are the best zivoni !
I had no clue about .shift() or that this method is called shift and lag.