Python Forum
FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries i
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries i
#1
wtd = pd.DataFrame(columns = df.columns)


indexarr = []
appender = 0

for index in range(1, df.shape[0]):

    if df.iat[index, 1] == df.iat[index-1, 1]:
        indexarr.append(index-1)
        appender = 1
        wtd = wtd._append(df.iloc[index-1], ignore_index=True)
    elif appender:
        indexarr.append(index-1)
        wtd = wtd._append(df.iloc[index-1], ignore_index=True)
        appender = 0
        print(indexarr)
        print(wtd)
        del indexarr
        del wtd
        indexarr = []
        wtd = pd.DataFrame(columns = df.columns)
So full disclosure - I'm a python newbie here. This is my first time working with python. In a nutsell, df is a pandas dataframe that I've read in from an excel spreadsheet. wtd is an empty dataframe with the same columns as df. So what my for loop is doing is just parsing the df dataframe, and in a particular column if there are duplicate values in subsequent rows, I want to identify those duplicate subsequent row and create a new dataframe wtd that is composed of those subsequent rows. indexarr is an array that I just put in there to help me keep track of things. For example:

Output:
[6, 7, 8] [9, 10, 11, 12, 13] [15, 16] [19, 20] [22, 23] [24, 25] [29, 30, 31, 32] [33, 34]
Each one of those is an array of the indices of rows that have the same subsequent field in a specified column. So I want to copy the ENTIRE row from dataframe df and insert it into a new dataframe wtd - the indices on wtd can be 0,1,2 etc that's not important, and dataframe wtd gets deleted and remade as blank after each "set". What I'm intending to do is pass wtd to a function and get a return value, but I don't have that part coded in yet.

My code works, but I get the warning
Error:
FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation. wtd = wtd._append(df.iloc[index-1], ignore_index=True)
I have absolutely no idea how to fix this. I would like to write proper code and have this fixed. So what do I do? Thanks.
Reply
#2
You should use vectorization instead of iterating. Read about vectorization here:

https://python.plainenglish.io/vectoriza...4fda08a184

In your example I would use vectorization to create a column/sequence of rows to keep (B[i] != B[i-1]) and make a new dataframe that contains the rows in df that have a True in the non-match column/sequence. Like this:
import pandas as pd

# Make a dataframe to filter with some duplicate consecutive values in B
df = pd.DataFrame({"A": [1, 2, 3, 4, 5, 6, 7], "B": list("ABBCADD")})
# Make a new dataframe from df where B[i] != B[i-1]
wtd = df.loc[df.B != df.B.shift(-1)]
print(wtd)
Output:
A B 0 1 A 2 3 B 3 4 C 4 5 A 6 7 D
Pandas programs should be short and concise. If they aren't, it usually means you are missing an opportunity to use vectorization.
Reply
#3
I see what your vectorized code did - and I like it. But I'm not sure if I would be able to vectorize for what I need to do -

So a bit more context. My list of repetitive values are unix timestamps, yes they are ordered chronologically. The thing is some rows have a unique timestamp, but sometimes multiple rows share the same timestamp. If multiple rows have the same timestamp, they will be sequential because the data is chronological. However, I can't just throw out all the rows that have the same timestamp - I need to take all the rows with the same timestamp and put them into a new dataframe. Then I pass that dataframe to a function that executives a weighted average calculation (which I actually used vectorization in that function without knowing it!), and the function combines the multiple rows into a single row. Then, I need to build a new dataframe where there the time value in each row is unique - and the whole set needs to be chronological.

Is there a vector operation that can just create for me a new dataframe out of all rows that have the same element in a specified column? And I would need to have a unique dataframe for each 'set'.

If I did it like this, I guess I'd need to also create another iterative loop then to reinsert the data? Or if I just iterate right off the bat, then I can just run the function and build a new dataframe immediately.

Thoughts? If I'm still best off iterating, I'd still like to get rid of the future warning message. If there is a better way to do this, any advice is appreciated.
Reply
#4
You should look at pandas groupby
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries sawtooth500 14 471 Apr-24-2024, 01:42 AM
Last Post: sawtooth500
  String concatenation in SQL update statement hammer 3 1,543 Feb-24-2022, 08:00 PM
Last Post: hammer
  f string concatenation problem growSeb 3 2,276 Jun-28-2021, 05:00 AM
Last Post: buran
  Concatenation ?? ridgerunnersjw 1 1,727 Sep-26-2020, 07:29 PM
Last Post: deanhystad
  FutureWarning: pandas.util.testing is deprecated buunaanaa 3 5,093 May-17-2020, 07:43 AM
Last Post: snippsat
  Combining two strings together (not concatenation) DreamingInsanity 6 3,154 Mar-29-2019, 04:32 PM
Last Post: DreamingInsanity
  Handling null or empty entries from Entry Widget KevinBrown 1 2,305 Mar-17-2019, 04:22 PM
Last Post: perfringo
  append elements into the empty dataframe jazzy 0 2,119 Sep-26-2018, 07:26 AM
Last Post: jazzy
  Regarding concatenation of a list to a string Kedar 2 22,802 Aug-19-2018, 12:57 PM
Last Post: ichabod801

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020