Python Forum
attempt to split values from within a dataframe column
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
attempt to split values from within a dataframe column
#1
I have the following code snippet from my script

cleandata["year3paytest"] = cleandata.apply(lambda x: semiannualpayments(x["owneroccupancycode"], x["year3total"], x["county"]), axis=1)
cleandata["year3pay1"] = cleandata["year3paytest"][0][0]
cleandata["year3pay2"] = cleandata["year3paytest"][0][1]
#TODO:enter semiannualpayments in clendadata df
It throws the following error:

Error:
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 2263, in pandas._libs.hashtable.Int64HashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 2273, in pandas._libs.hashtable.Int64HashTable.get_item KeyError: 0 The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\GitHub\PropertyTax\tanyardsprings.py", line 219, in <module> cleandata["year3pay1"] = cleandata["year3paytest"][0][0] ~~~~~~~~~~~~~~~~~~~~~~~~~^^^ File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\series.py", line 981, in __getitem__ return self._get_value(key) ^^^^^^^^^^^^^^^^^^^^ File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\series.py", line 1089, in _get_value loc = self.index.get_loc(label) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\indexes\base.py", line 3804, in get_loc raise KeyError(key) from err KeyError: 0 Process finished with exit code 1
What I am trying to do is to split the values into two different columns in the data frame. For example the value of year3paytest is the (1500,1505). I want the first value to now be year3pay1 and then the second to be year3pay2.

This works without issue in a smaller snippet of code that I have for my initial testing. The only thing that changed was the dataframe name and the amount of rows in that dataframe.

What am I missing here?
Reply
#2
Like this?
import pandas as pd


def myfunc(row):
    return row[0] + row[1], row[0] * row[1]


df = pd.DataFrame({"A": range(5), "B": range(10, 15), "C": range(20, 25)})
df[["D", "E"]] = df[["A", "B"]].apply(myfunc, axis=1, result_type='expand')
print(df)
Output:
A B C D E 0 0 10 20 10 0 1 1 11 21 12 11 2 2 12 22 14 24 3 3 13 23 16 39 4 4 14 24 18 56
This part creates a dataframe that has columns "A" and "B".
df[["A", "B"]]
This says we are going to pass the rows of that dataframe (axis=1) to myfunc().
df[["A", "B"]].apply(myfunc, axis=1)
myfunc() returns a tuple with two values. I want to expand this to two columns.
df[["A", "B"]].apply(myfunc, axis=1, result_type='expand')
And I want to add these two new columns to df and call them "D" and "E"
df[["D", "E"]] = df[["A", "B"]].apply(myfunc, axis=1, result_type='expand')
Notice the function arguments for myfunc() is a series (row) and I get the values of the series using an integer index instead of a index name ("A", "B").
def function(row):
    return row[0] + row[1], row[0] * row[1]
This lets me apply the same function to any two columns. All I have to do is create a different dataframe (df["A", "C"] for example) that is used to supply the rows.

Alternatively, I could pass the entire row and use column index names inside myfunc() to access the values.
import pandas as pd


def function(row):
    return row["A"] + row["B"], row["A"] * row["B"]


df = pd.DataFrame({"A": range(5), "B": range(10, 15), "C": range(20, 25)})
df[["D", "E"]] = df.apply(function, axis=1, result_type='expand')
print(df)
The result is the same. As is this where I pass the column index names as additional arguments.
import pandas as pd


def function(row, a, b):
    return row[a] + row[b], row[a] * row[b]


df = pd.DataFrame({"A": range(5), "B": range(10, 15), "C": range(20, 25)})
df[["D", "E"]] = df.apply(function, args=("A", "B"), axis=1, result_type='expand')
print(df)
Reply
#3
I think I got this. My updated code is below:

function that is being called
def semiannualpayments (owneroccupied, totalpayment, county):
    from interest import anneinterest 
    if (county == "ANNE"):
        if (owneroccupied == "Yes"):
            paymentone = totalpayment / 2
            paymenttwo = totalpayment - paymentone
            paymenttwo = (paymenttwo * (interestrate(county))) + paymenttwo
            return paymentone, paymenttwo
        else:
            paymentone = totalpayment
            paymenttwo = 0
            return paymentone, paymenttwo
    else:
        paymentone = 0
        paymenttwo = 0
        return paymentone, paymenttwo
main code
cleandata["year3pay1", "year3pay2"] = cleandata.apply(semiannualpayments,
                                                      args=("owneroccupancycode", "year3total", "county"), axis=1,
                                                      result_type='expand')
I am now getting the following error:

Error:
Traceback (most recent call last): File "C:\GitHub\PropertyTax\tanyardsprings.py", line 230, in <module> cleandata["year3pay1", "year3pay2"] = cleandata.apply(semiannualpayments, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\frame.py", line 9568, in apply return op.apply().__finalize__(self, method="apply") ^^^^^^^^^^ File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\apply.py", line 764, in apply return self.apply_standard() ^^^^^^^^^^^^^^^^^^^^^ File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\apply.py", line 891, in apply_standard results, res_index = self.apply_series_generator() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\apply.py", line 907, in apply_series_generator results[i] = self.f(v) ^^^^^^^^^ File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\apply.py", line 142, in f return func(x, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: semiannualpayments() takes 3 positional arguments but 4 were given Process finished with exit code 1
Reply
#4
You are missing important details about how arguments get passed to the function. I'll use this simple example to poke around and see what arguments are passed to the function when using apply() with the "args" keyword..
import pandas as pd

def function(*args):
    print(*map(type, args))  # Print arg types
    return 1, 2

df = pd.DataFrame({"A": [1], "B": [3], "C": [5]})
df[["D", "E"]] = df.apply(function, args=("A", "B"), axis=1, result_type='expand')
Output:
<class 'pandas.core.series.Series'> <class 'str'> <class 'str'>
The first argument passed to the function is a series. The next two arguments are strings. Let's see what the values are.
import pandas as pd

def function(*args):
    print(*args)  # print arg values
    return 1, 2

df = pd.DataFrame({"A": [1], "B": [3], "C": [5]})
df[["D", "E"]] = df.apply(function, args=("A", "B"), axis=1, result_type='expand')
Output:
A 1 B 3 C 5 Name: 0, dtype: int64 A B
The first argument is a series. In this case it is the dataframe "df". The next two arguments are strings "A" and "B". The series is passed as the first argument to the function because that's what apply does. When you "apply" a function to a series, the series is passed as the first argument to the function. The strings "A" and "B" are passed because I have "args=("A", "B") as an argument to the "apply" function.

Now look at your code. This is how you call the function.
cleandata["year3pay1", "year3pay2"] = cleandata.apply(semiannualpayments,
                                                      args=("owneroccupancycode", "year3total", "county"), axis=1,
                                                      result_type='expand')
The first argument passed to your function is the series "cleandata". You also pass three string arguments; "owneroccupancycode", "year3total" and "county".

Look at your function:
def semiannualpayments (owneroccupied, totalpayment, county):
The first problem is that your function takes three arguments, but you pass four: series, str, str, str.

Looking deeper into your function we see additional problems:
    if (county == "ANNE"):
        if (owneroccupied == "Yes"):
It appears that you think country might be "ANNE" and that owneroccupied might be "Yes". This will never happen. owneroccupied is the first argument. It will be the series cleandata. country is the third argument. It will be the string "year3total". You also do some math with totalpayment, but totalpayment is the second agrument, and will always be the string "owneroccupancycode".

Looking at your function, I would solve like this:
# Get the columns you want to pass to the function
columns = cleandata[["owneroccupancycode", "year3total", "country"]]
# Apply the function
cleandata[["year3pay1", "year3pay2"]] = columns.apply(semiannualpayments, axis=1, result_type='expand')
And the function looks like this:
def semiannualpayments (row):
    owneroccupied, totalpayment, county = row  # Unpack the values from the row
    from interest import anneinterest   # Put all imports at the top of the module.  anneinterest not used
    if (county == "ANNE"):
        if (owneroccupied == "Yes"):
            paymentone = totalpayment / 2
            paymenttwo = totalpayment - paymentone
            paymenttwo = (paymenttwo * (interestrate(county))) + paymenttwo
        else:
            paymentone = totalpayment
            paymenttwo = 0
    else:
        paymentone = 0
        paymenttwo = 0
    return paymentone, paymenttwo
Or you could pass the column index names as arguments and use these in the function to extract values from the row.
def semiannual_payments (row, occupied, payment, country):
    occupied = row[occupied]  # Extract values from row
    payment = row[payment]
    country = row[country]

    # Compute payments
    a = b = 0
    if (county == "ANNE"):
        if (occupied == "Yes"):
            a = payment / 2
            b = (payment - a) * (interest_rate(country) + 1)
        else:
            a = payment
    return a, b


cleandata[["year3pay1", "year3pay2"]] = cleandata.apply(
    semiannual_payments,
    args=("owneroccupancycode", "year3total", "country"),
    axis=1,
    result_type='expand')
Notice that the additional arguments appear after the argument for the series (row). Remember, the extra arguments are strings, not the actual columns. Use indexing to extract the values from the row,
Reply
#5
I get the following error

Error:
TypeError: semiannualpayments() takes 1 positional argument but 3 were given
Reply
#6
You get that error for what? Where is the code that results in this error? Don't post error messages without code. When posting errors, post the entire error message, including the trace.

If this is the function def:
def semiannualpayments (row):
You do not pass additional arguments using "args=".
Reply
#7
(Apr-09-2023, 11:10 PM)deanhystad Wrote: You get that error for what? Where is the code that results in this error? Don't post error messages without code. When posting errors, post the entire error message, including the trace.

If this is the function def:
def semiannualpayments (row):
You do not pass additional arguments using "args=".

My apologizes


I just edited this post. I got it working I forgot to pull something out of the code.
Reply
#8
Error says your dataframe does not have a column named "country". Should be "county". Typo in my code. Still, it is something you should have spotted right away.
Reply
#9
(Apr-10-2023, 02:03 AM)deanhystad Wrote: Error says your dataframe does not have a column named "country". Do yoy mean "county"? That is not than the error you posted earlier.

Yes the earlier error I found out that I forgot to take something out that needed to be taken out with the updated function that you shared. Then the next error was something because it should be county vs country.

Once I fixed that all worked.

Thank you for your assistance
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  concat 3 columns of dataframe to one column flash77 2 855 Oct-03-2023, 09:29 PM
Last Post: flash77
  HTML Decoder pandas dataframe column mbrown009 3 1,063 Sep-29-2023, 05:56 PM
Last Post: deanhystad
  Increase df column values decimals SriRajesh 2 1,119 Nov-14-2022, 05:20 PM
Last Post: deanhystad
  New Dataframe Column Based on Several Conditions nb1214 1 1,823 Nov-16-2021, 10:52 PM
Last Post: jefsummers
  pandas: Compute the % of the unique values in a column JaneTan 1 1,791 Oct-25-2021, 07:55 PM
Last Post: jefsummers
  Putting column name to dataframe, can't work. jonah88888 1 1,845 Sep-28-2021, 07:45 PM
Last Post: deanhystad
  Remove specific values from dataframe jonah88888 0 1,721 Sep-24-2021, 05:09 AM
Last Post: jonah88888
  update values in one dataframe based on another dataframe - Pandas iliasb 2 9,327 Aug-14-2021, 12:38 PM
Last Post: jefsummers
  Setting the x-axis to a specific column in a dataframe devansing 0 2,043 May-23-2021, 12:11 AM
Last Post: devansing
Question [Solved] How to refer to dataframe column name based on a list lorensa74 1 2,281 May-17-2021, 07:02 AM
Last Post: lorensa74

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020