Python Forum

Full Version: Fix pandas copy/slice warning.
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
In another thread I posted this code:
import pandas as pd
from string import ascii_letters as letters
from random import choice, choices, randint
 
 
def find_supplier(description):
    """Return word if word in description matches a supplier code, else None."""
    intersection = set(description.split()) & suppliers
    return list(intersection)[0] if intersection else None
 
 
# Make some random table thing that we can use to search for words in the description
# that match a supplier code.
product_table = pd.DataFrame(
    [
        {
            "Product": i,
            "Supplier Code": choice("ABCDE"),
            "Description": " ".join(choices(letters, k=randint(5, 10))),
        }
        for i in range(100, 120)
    ]
)
 
# Get set of suppliers.
suppliers = set(product_table["Supplier Code"].values)
 
# Make supplier table.  Supplier table contains rows from product_table
# where one of the words in the description matches a supplier code.
supplier_table = product_table[["Description"]]
supplier_table["Product"] = supplier_table["Description"].map(find_supplier)
supplier_table = supplier_table[~supplier_table["Product"].isna()][
    ["Product", "Description"]
]
print(supplier_table)
When I run it I get a warning.
Error:
...test.py:31: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy supplier_table["Product"] = product_table["Description"].apply(find_supplier)
I've seen this messge before. In other cases the chaining was obvious and easy to fix. Here I cannot see the chaining and I have no idea how to fix.
Probably this won't help you, I don't know pandas. I was just looking at your code, trying to learn a bit about pandas.

If I assign:

description = supplier_table["Description"] 
type(description)
<class 'pandas.core.series.Series'>
Try split() on description:

description.split()
Traceback (most recent call last):
  File "/usr/lib/python3.10/idlelib/run.py", line 578, in runcode
    exec(code, self.locals)
  File "<pyshell#35>", line 1, in <module>
  File "/home/pedro/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 5902, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'split'. Did you mean: 'plot'?
This works however, using your function find_supplier(description) :

supplier_table["Description"].map(find_supplier)
0     B
3     D
4     A
6     B
7     C
9     D
11    E
14    D
16    D
18    D
19    E
Name: Description, dtype: object
This, oddly, does not work, with the same AttributeError: 'Series' object has no attribute 'split'. Did you mean: 'plot'?:

intersection = set(description.split()) & suppliers
Traceback (most recent call last):
  File "/usr/lib/python3.10/idlelib/run.py", line 578, in runcode
    exec(code, self.locals)
  File "<pyshell#39>", line 1, in <module>
  File "/home/pedro/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 5902, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'split'. Did you mean: 'plot'?
So, because your code works, you CAN split() description, AND you shouldn't be able to split() description!
Try change to this.
SettingWithCopyWarning is generated by pandas when you are modifying a DataFrame that is actually a view of another DataFrame
By using .copy() to make a copy of the DataFrame before modifying it,should avoid the SettingWithCopyWarning.
This ensures that are working on a new DataFrame rather than inadvertently modifying the original one.
# Make supplier table.  Supplier table contains rows from product_table
# where one of the words in the description matches a supplier code.
supplier_table = product_table[["Description"]].copy()
supplier_table["Product"] = supplier_table["Description"].map(find_supplier)
supplier_table = supplier_table[~supplier_table["Product"].isna()]
supplier_table = supplier_table[["Product", "Description"]]
print(supplier_table)
@snippsat: Duh! So obvious. I was already thinking of supplier_table as a separate dataframe and could not figure out why using the map() function generated a warning. Thanks!