FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries i

sawtooth500 · Mar-21-2024, 01:54 AM

wtd = pd.DataFrame(columns = df.columns)


indexarr = []
appender = 0

for index in range(1, df.shape[0]):

    if df.iat[index, 1] == df.iat[index-1, 1]:
        indexarr.append(index-1)
        appender = 1
        wtd = wtd._append(df.iloc[index-1], ignore_index=True)
    elif appender:
        indexarr.append(index-1)
        wtd = wtd._append(df.iloc[index-1], ignore_index=True)
        appender = 0
        print(indexarr)
        print(wtd)
        del indexarr
        del wtd
        indexarr = []
        wtd = pd.DataFrame(columns = df.columns)

So full disclosure - I'm a python newbie here. This is my first time working with python. In a nutsell, df is a pandas dataframe that I've read in from an excel spreadsheet. wtd is an empty dataframe with the same columns as df. So what my for loop is doing is just parsing the df dataframe, and in a particular column if there are duplicate values in subsequent rows, I want to identify those duplicate subsequent row and create a new dataframe wtd that is composed of those subsequent rows. indexarr is an array that I just put in there to help me keep track of things. For example:

Output:[6, 7, 8]
[9, 10, 11, 12, 13]
[15, 16]
[19, 20]
[22, 23]
[24, 25]
[29, 30, 31, 32]
[33, 34]

Each one of those is an array of the indices of rows that have the same subsequent field in a specified column. So I want to copy the ENTIRE row from dataframe df and insert it into a new dataframe wtd - the indices on wtd can be 0,1,2 etc that's not important, and dataframe wtd gets deleted and remade as blank after each "set". What I'm intending to do is pass wtd to a function and get a return value, but I don't have that part coded in yet.

My code works, but I get the warning

Error:FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  wtd = wtd._append(df.iloc[index-1], ignore_index=True)

I have absolutely no idea how to fix this. I would like to write proper code and have this fixed. So what do I do? Thanks.

**deanhystad** · (This post was last modified: Mar-21-2024, 04:43 AM by deanhystad.)

You should use vectorization instead of iterating. Read about vectorization here:

https://python.plainenglish.io/vectoriza...4fda08a184

In your example I would use vectorization to create a column/sequence of rows to keep (B[i] != B[i-1]) and make a new dataframe that contains the rows in df that have a True in the non-match column/sequence. Like this:

import pandas as pd

# Make a dataframe to filter with some duplicate consecutive values in B
df = pd.DataFrame({"A": [1, 2, 3, 4, 5, 6, 7], "B": list("ABBCADD")})
# Make a new dataframe from df where B[i] != B[i-1]
wtd = df.loc[df.B != df.B.shift(-1)]
print(wtd)

Output:   A  B
0  1  A
2  3  B
3  4  C
4  5  A
6  7  D

Pandas programs should be short and concise. If they aren't, it usually means you are missing an opportunity to use vectorization.

sawtooth500 · Mar-21-2024, 07:57 PM

I see what your vectorized code did - and I like it. But I'm not sure if I would be able to vectorize for what I need to do -

So a bit more context. My list of repetitive values are unix timestamps, yes they are ordered chronologically. The thing is some rows have a unique timestamp, but sometimes multiple rows share the same timestamp. If multiple rows have the same timestamp, they will be sequential because the data is chronological. However, I can't just throw out all the rows that have the same timestamp - I need to take all the rows with the same timestamp and put them into a new dataframe. Then I pass that dataframe to a function that executives a weighted average calculation (which I actually used vectorization in that function without knowing it!), and the function combines the multiple rows into a single row. Then, I need to build a new dataframe where there the time value in each row is unique - and the whole set needs to be chronological.

Is there a vector operation that can just create for me a new dataframe out of all rows that have the same element in a specified column? And I would need to have a unique dataframe for each 'set'.

If I did it like this, I guess I'd need to also create another iterative loop then to reinsert the data? Or if I just iterate right off the bat, then I can just run the function and build a new dataframe immediately.

Thoughts? If I'm still best off iterating, I'd still like to get rid of the future warning message. If there is a better way to do this, any advice is appreciated.

**deanhystad** · Mar-22-2024, 03:08 AM

You should look at pandas groupby

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries	sawtooth500	14	471	Apr-24-2024, 01:42 AM Last Post: sawtooth500
	String concatenation in SQL update statement	hammer	3	1,543	Feb-24-2022, 08:00 PM Last Post: hammer
	f string concatenation problem	growSeb	3	2,276	Jun-28-2021, 05:00 AM Last Post: buran
	Concatenation ??	ridgerunnersjw	1	1,727	Sep-26-2020, 07:29 PM Last Post: deanhystad
	FutureWarning: pandas.util.testing is deprecated	buunaanaa	3	5,093	May-17-2020, 07:43 AM Last Post: snippsat
	Combining two strings together (not concatenation)	DreamingInsanity	6	3,154	Mar-29-2019, 04:32 PM Last Post: DreamingInsanity
	Handling null or empty entries from Entry Widget	KevinBrown	1	2,305	Mar-17-2019, 04:22 PM Last Post: perfringo
	append elements into the empty dataframe	jazzy	0	2,119	Sep-26-2018, 07:26 AM Last Post: jazzy
	Regarding concatenation of a list to a string	Kedar	2	22,802	Aug-19-2018, 12:57 PM Last Post: ichabod801

FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries i

User Panel Messages

Announcements