Posts: 30
Threads: 10
Joined: May 2021
I have the following code snippet from my script
cleandata["year3paytest"] = cleandata.apply(lambda x: semiannualpayments(x["owneroccupancycode"], x["year3total"], x["county"]), axis=1)
cleandata["year3pay1"] = cleandata["year3paytest"][0][0]
cleandata["year3pay2"] = cleandata["year3paytest"][0][1]
#TODO:enter semiannualpayments in clendadata df It throws the following error:
Error: File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 2263, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 2273, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\GitHub\PropertyTax\tanyardsprings.py", line 219, in <module>
cleandata["year3pay1"] = cleandata["year3paytest"][0][0]
~~~~~~~~~~~~~~~~~~~~~~~~~^^^
File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\series.py", line 981, in __getitem__
return self._get_value(key)
^^^^^^^^^^^^^^^^^^^^
File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\series.py", line 1089, in _get_value
loc = self.index.get_loc(label)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\indexes\base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: 0
Process finished with exit code 1
What I am trying to do is to split the values into two different columns in the data frame. For example the value of year3paytest is the (1500,1505). I want the first value to now be year3pay1 and then the second to be year3pay2.
This works without issue in a smaller snippet of code that I have for my initial testing. The only thing that changed was the dataframe name and the amount of rows in that dataframe.
What am I missing here?
Posts: 6,800
Threads: 20
Joined: Feb 2020
Like this?
import pandas as pd
def myfunc(row):
return row[0] + row[1], row[0] * row[1]
df = pd.DataFrame({"A": range(5), "B": range(10, 15), "C": range(20, 25)})
df[["D", "E"]] = df[["A", "B"]].apply(myfunc, axis=1, result_type='expand')
print(df) Output: A B C D E
0 0 10 20 10 0
1 1 11 21 12 11
2 2 12 22 14 24
3 3 13 23 16 39
4 4 14 24 18 56
This part creates a dataframe that has columns "A" and "B".
df[["A", "B"]] This says we are going to pass the rows of that dataframe (axis=1) to myfunc().
df[["A", "B"]].apply(myfunc, axis=1) myfunc() returns a tuple with two values. I want to expand this to two columns.
df[["A", "B"]].apply(myfunc, axis=1, result_type='expand') And I want to add these two new columns to df and call them "D" and "E"
df[["D", "E"]] = df[["A", "B"]].apply(myfunc, axis=1, result_type='expand') Notice the function arguments for myfunc() is a series (row) and I get the values of the series using an integer index instead of a index name ("A", "B").
def function(row):
return row[0] + row[1], row[0] * row[1] This lets me apply the same function to any two columns. All I have to do is create a different dataframe (df["A", "C"] for example) that is used to supply the rows.
Alternatively, I could pass the entire row and use column index names inside myfunc() to access the values.
import pandas as pd
def function(row):
return row["A"] + row["B"], row["A"] * row["B"]
df = pd.DataFrame({"A": range(5), "B": range(10, 15), "C": range(20, 25)})
df[["D", "E"]] = df.apply(function, axis=1, result_type='expand')
print(df) The result is the same. As is this where I pass the column index names as additional arguments.
import pandas as pd
def function(row, a, b):
return row[a] + row[b], row[a] * row[b]
df = pd.DataFrame({"A": range(5), "B": range(10, 15), "C": range(20, 25)})
df[["D", "E"]] = df.apply(function, args=("A", "B"), axis=1, result_type='expand')
print(df)
Posts: 30
Threads: 10
Joined: May 2021
I think I got this. My updated code is below:
function that is being called
def semiannualpayments (owneroccupied, totalpayment, county):
from interest import anneinterest
if (county == "ANNE"):
if (owneroccupied == "Yes"):
paymentone = totalpayment / 2
paymenttwo = totalpayment - paymentone
paymenttwo = (paymenttwo * (interestrate(county))) + paymenttwo
return paymentone, paymenttwo
else:
paymentone = totalpayment
paymenttwo = 0
return paymentone, paymenttwo
else:
paymentone = 0
paymenttwo = 0
return paymentone, paymenttwo main code
cleandata["year3pay1", "year3pay2"] = cleandata.apply(semiannualpayments,
args=("owneroccupancycode", "year3total", "county"), axis=1,
result_type='expand') I am now getting the following error:
Error: Traceback (most recent call last):
File "C:\GitHub\PropertyTax\tanyardsprings.py", line 230, in <module>
cleandata["year3pay1", "year3pay2"] = cleandata.apply(semiannualpayments,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\frame.py", line 9568, in apply
return op.apply().__finalize__(self, method="apply")
^^^^^^^^^^
File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\apply.py", line 764, in apply
return self.apply_standard()
^^^^^^^^^^^^^^^^^^^^^
File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\apply.py", line 891, in apply_standard
results, res_index = self.apply_series_generator()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\apply.py", line 907, in apply_series_generator
results[i] = self.f(v)
^^^^^^^^^
File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\apply.py", line 142, in f
return func(x, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: semiannualpayments() takes 3 positional arguments but 4 were given
Process finished with exit code 1
Posts: 6,800
Threads: 20
Joined: Feb 2020
Apr-09-2023, 10:39 PM
(This post was last modified: Apr-09-2023, 10:54 PM by deanhystad.)
You are missing important details about how arguments get passed to the function. I'll use this simple example to poke around and see what arguments are passed to the function when using apply() with the "args" keyword..
import pandas as pd
def function(*args):
print(*map(type, args)) # Print arg types
return 1, 2
df = pd.DataFrame({"A": [1], "B": [3], "C": [5]})
df[["D", "E"]] = df.apply(function, args=("A", "B"), axis=1, result_type='expand') Output: <class 'pandas.core.series.Series'> <class 'str'> <class 'str'>
The first argument passed to the function is a series. The next two arguments are strings. Let's see what the values are.
import pandas as pd
def function(*args):
print(*args) # print arg values
return 1, 2
df = pd.DataFrame({"A": [1], "B": [3], "C": [5]})
df[["D", "E"]] = df.apply(function, args=("A", "B"), axis=1, result_type='expand') Output: A 1
B 3
C 5
Name: 0, dtype: int64 A B
The first argument is a series. In this case it is the dataframe "df". The next two arguments are strings "A" and "B". The series is passed as the first argument to the function because that's what apply does. When you "apply" a function to a series, the series is passed as the first argument to the function. The strings "A" and "B" are passed because I have "args=("A", "B") as an argument to the "apply" function.
Now look at your code. This is how you call the function.
cleandata["year3pay1", "year3pay2"] = cleandata.apply(semiannualpayments,
args=("owneroccupancycode", "year3total", "county"), axis=1,
result_type='expand') The first argument passed to your function is the series "cleandata". You also pass three string arguments; "owneroccupancycode", "year3total" and "county".
Look at your function:
def semiannualpayments (owneroccupied, totalpayment, county): The first problem is that your function takes three arguments, but you pass four: series, str, str, str.
Looking deeper into your function we see additional problems:
if (county == "ANNE"):
if (owneroccupied == "Yes"): It appears that you think country might be "ANNE" and that owneroccupied might be "Yes". This will never happen. owneroccupied is the first argument. It will be the series cleandata. country is the third argument. It will be the string "year3total". You also do some math with totalpayment, but totalpayment is the second agrument, and will always be the string "owneroccupancycode".
Looking at your function, I would solve like this:
# Get the columns you want to pass to the function
columns = cleandata[["owneroccupancycode", "year3total", "country"]]
# Apply the function
cleandata[["year3pay1", "year3pay2"]] = columns.apply(semiannualpayments, axis=1, result_type='expand') And the function looks like this:
def semiannualpayments (row):
owneroccupied, totalpayment, county = row # Unpack the values from the row
from interest import anneinterest # Put all imports at the top of the module. anneinterest not used
if (county == "ANNE"):
if (owneroccupied == "Yes"):
paymentone = totalpayment / 2
paymenttwo = totalpayment - paymentone
paymenttwo = (paymenttwo * (interestrate(county))) + paymenttwo
else:
paymentone = totalpayment
paymenttwo = 0
else:
paymentone = 0
paymenttwo = 0
return paymentone, paymenttwo Or you could pass the column index names as arguments and use these in the function to extract values from the row.
def semiannual_payments (row, occupied, payment, country):
occupied = row[occupied] # Extract values from row
payment = row[payment]
country = row[country]
# Compute payments
a = b = 0
if (county == "ANNE"):
if (occupied == "Yes"):
a = payment / 2
b = (payment - a) * (interest_rate(country) + 1)
else:
a = payment
return a, b
cleandata[["year3pay1", "year3pay2"]] = cleandata.apply(
semiannual_payments,
args=("owneroccupancycode", "year3total", "country"),
axis=1,
result_type='expand') Notice that the additional arguments appear after the argument for the series (row). Remember, the extra arguments are strings, not the actual columns. Use indexing to extract the values from the row,
Posts: 30
Threads: 10
Joined: May 2021
I get the following error
Error: TypeError: semiannualpayments() takes 1 positional argument but 3 were given
Posts: 6,800
Threads: 20
Joined: Feb 2020
Apr-09-2023, 11:10 PM
(This post was last modified: Apr-09-2023, 11:10 PM by deanhystad.)
You get that error for what? Where is the code that results in this error? Don't post error messages without code. When posting errors, post the entire error message, including the trace.
If this is the function def:
def semiannualpayments (row): You do not pass additional arguments using "args=".
Posts: 30
Threads: 10
Joined: May 2021
Apr-10-2023, 01:57 AM
(This post was last modified: Apr-10-2023, 01:57 AM by mbrown009.)
(Apr-09-2023, 11:10 PM)deanhystad Wrote: You get that error for what? Where is the code that results in this error? Don't post error messages without code. When posting errors, post the entire error message, including the trace.
If this is the function def:
def semiannualpayments (row): You do not pass additional arguments using "args=".
My apologizes
I just edited this post. I got it working I forgot to pull something out of the code.
Posts: 6,800
Threads: 20
Joined: Feb 2020
Apr-10-2023, 02:03 AM
(This post was last modified: Apr-10-2023, 02:08 AM by deanhystad.)
Error says your dataframe does not have a column named "country". Should be "county". Typo in my code. Still, it is something you should have spotted right away.
Posts: 30
Threads: 10
Joined: May 2021
(Apr-10-2023, 02:03 AM)deanhystad Wrote: Error says your dataframe does not have a column named "country". Do yoy mean "county"? That is not than the error you posted earlier.
Yes the earlier error I found out that I forgot to take something out that needed to be taken out with the updated function that you shared. Then the next error was something because it should be county vs country.
Once I fixed that all worked.
Thank you for your assistance
Posts: 38
Threads: 0
Joined: Jun 2024
It looks like the issue might be due to the way you're trying to access the values in the year3paytest column. Instead of directly using [0][0], try iterating through the rows or using apply again to split the values. # Assuming semiannualpayments returns a tuple like (1500, 1505) cleandata["year3paytest"] = cleandata.apply(lambda x: semiannualpayments(x["owneroccupancycode"], x["year3total"], x["county"]), axis=1) # Split the tuples into two separate columns cleandata["year3pay1"] = cleandata["year3paytest"].apply(lambda x: x[0]) cleandata["year3pay2"] = cleandata["year3paytest"].apply(lambda x: x[1]) This should help you split the tuple values into two separate columns without running into the KeyError. Give it a try and see if it resolves the issue!
|