Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Dataframe copy warning
#1
So I get a dataframe, which I call 'df' into python by loading it from an excel file -

Next I create a new df, which I call 'tempdf', but using boolean indexing. tempdf is a subset of df.

So two columns in tempdf are price and size. I use a vector operation that creates a new column called weightmul which is price * size.

tempdf['weightmul'] = tempdf.apply(lambda row: row['price'] * row['size'], axis = 1)
So python doesn't like this. I gives me this warning:

Error:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
Now, my vector operation does exactly what I want - it creates a new column called weightmul, which is the product of price * size. Now, if I run this operation on the original dataframe, df, it's happy. But if it's a subset dataframe, tempdf which was created using indexing, I get the warning.

So is there a better way to do what I'm trying to do? Python tells me to "Try using .loc[row_indexer,col_indexer] = value instead" - but I really don't know what it means by that....

Thank you.
Reply
#2
Maybe try like this:

import pandas as pd

xlfile = '/home/pedro/myPython/pandas/xl_files/weight_X_price.xlsx'
df = pd.read_excel(xlfile)
# get a subset based on condition
boolean_series = df['weight'] > 4
filtered_data = df[boolean_series]
result = filtered_data.copy()
values = result.price * result.size
# this will throw the error
# filtered_data['weighmul'] = values so make a copy first
result['weighmul'] = values
# set value with a condition
result['Values'] = values.where(result.price > 5, other=-values)
Gives:

Output:
result weight price size weighmul Values 4 5 6 5 108 108 5 6 5 6 90 -90 6 7 4 7 72 -72 7 8 3 8 54 -54 8 9 2 9 36 -36 9 10 1 10 18 -18
Reply
#3
Sorry, the arithmetic was off! .size is a reserved word/function, I wasn't aware of that!

result.size will give the number of data in the dataframe, use result['size']

values = result.price * result['size']
This works for me:

import pandas as pd

xlfile = '/home/pedro/myPython/pandas/xl_files/weight_X_price.xlsx'
df = pd.read_excel(xlfile)
boolean_series = df['weight'] > 4
filtered_data = df[boolean_series]
result = filtered_data.copy()
values = result.price * result['size']
result['weighmul'] = values
Reply
#4
When posting, please provide a runnable example that demonstrates the problem.

There is no need to use apply. Instead of this:
tempdf['weightmul'] = tempdf.apply(lambda row: row['price'] * row['size'], axis = 1)
Do this:
tempdf['weightmul'] = tempdf.price * tempdf.size
But that doesn't "fix" your problem. Did you read the link in the warning message? If not, I suggest you do.

https://pandas.pydata.org/pandas-docs/st...sus-a-copy

Turns out that what you are doing is not a problem, but it looks like something else that is/was a problem called a "chained reference" (see link for description). Soon there will be a change to pandas that eliminates the potential problem, and you will no longer get a warning when running your program. Until then, you can use DataFrame.copy, or tell pandas to use "copy on write" (also described in link). Run the code below, then comment out line 3 and run again.
import pandas as pd

pd.options.mode.copy_on_write = True  # Turn on copy on write.  Soon to be default

df = pd.DataFrame({"price": [4, 5, 6, 7, 8], "size": [1, 2, 3, 1, 2], "keep": [0, 1, 1, 1, 0]})
df2 = df[df.keep == 1]
df2["weightmul"] = df2.price * df2.size
Reply
#5
Thank you so much for the help Dean, it worked!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Why is the copy method name in python list copy and not `__copy__`? YouHoGeon 2 289 Apr-04-2024, 01:18 AM
Last Post: YouHoGeon
  Fix pandas copy/slice warning. deanhystad 3 841 Sep-07-2023, 03:18 PM
Last Post: deanhystad
  Copy a column from one dataframe to another dataframe Led_Zeppelin 17 11,463 Jul-08-2022, 08:40 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020