attempt to split values from within a dataframe column

mbrown009 · Apr-07-2023, 01:54 AM

I have the following code snippet from my script

cleandata["year3paytest"] = cleandata.apply(lambda x: semiannualpayments(x["owneroccupancycode"], x["year3total"], x["county"]), axis=1)
cleandata["year3pay1"] = cleandata["year3paytest"][0][0]
cleandata["year3pay2"] = cleandata["year3paytest"][0][1]
#TODO:enter semiannualpayments in clendadata df

It throws the following error:

Error:File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 2263, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 2273, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\GitHub\PropertyTax\tanyardsprings.py", line 219, in <module>
    cleandata["year3pay1"] = cleandata["year3paytest"][0][0]
                             ~~~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\series.py", line 981, in __getitem__
    return self._get_value(key)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\series.py", line 1089, in _get_value
    loc = self.index.get_loc(label)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\indexes\base.py", line 3804, in get_loc
    raise KeyError(key) from err
KeyError: 0

Process finished with exit code 1

What I am trying to do is to split the values into two different columns in the data frame. For example the value of year3paytest is the (1500,1505). I want the first value to now be year3pay1 and then the second to be year3pay2.

This works without issue in a smaller snippet of code that I have for my initial testing. The only thing that changed was the dataframe name and the amount of rows in that dataframe.

What am I missing here?

**deanhystad** · Apr-07-2023, 03:34 AM

Like this?

import pandas as pd


def myfunc(row):
    return row[0] + row[1], row[0] * row[1]


df = pd.DataFrame({"A": range(5), "B": range(10, 15), "C": range(20, 25)})
df[["D", "E"]] = df[["A", "B"]].apply(myfunc, axis=1, result_type='expand')
print(df)

Output:   A   B   C   D   E
0  0  10  20  10   0
1  1  11  21  12  11
2  2  12  22  14  24
3  3  13  23  16  39
4  4  14  24  18  56

This part creates a dataframe that has columns "A" and "B".

df[["A", "B"]]

This says we are going to pass the rows of that dataframe (axis=1) to myfunc().

df[["A", "B"]].apply(myfunc, axis=1)

myfunc() returns a tuple with two values. I want to expand this to two columns.

df[["A", "B"]].apply(myfunc, axis=1, result_type='expand')

And I want to add these two new columns to df and call them "D" and "E"

df[["D", "E"]] = df[["A", "B"]].apply(myfunc, axis=1, result_type='expand')

Notice the function arguments for myfunc() is a series (row) and I get the values of the series using an integer index instead of a index name ("A", "B").

def function(row):
    return row[0] + row[1], row[0] * row[1]

This lets me apply the same function to any two columns. All I have to do is create a different dataframe (df["A", "C"] for example) that is used to supply the rows.

Alternatively, I could pass the entire row and use column index names inside myfunc() to access the values.

import pandas as pd


def function(row):
    return row["A"] + row["B"], row["A"] * row["B"]


df = pd.DataFrame({"A": range(5), "B": range(10, 15), "C": range(20, 25)})
df[["D", "E"]] = df.apply(function, axis=1, result_type='expand')
print(df)

The result is the same. As is this where I pass the column index names as additional arguments.

import pandas as pd


def function(row, a, b):
    return row[a] + row[b], row[a] * row[b]


df = pd.DataFrame({"A": range(5), "B": range(10, 15), "C": range(20, 25)})
df[["D", "E"]] = df.apply(function, args=("A", "B"), axis=1, result_type='expand')
print(df)

mbrown009 · Apr-08-2023, 02:11 AM

I think I got this. My updated code is below:

function that is being called

def semiannualpayments (owneroccupied, totalpayment, county):
    from interest import anneinterest 
    if (county == "ANNE"):
        if (owneroccupied == "Yes"):
            paymentone = totalpayment / 2
            paymenttwo = totalpayment - paymentone
            paymenttwo = (paymenttwo * (interestrate(county))) + paymenttwo
            return paymentone, paymenttwo
        else:
            paymentone = totalpayment
            paymenttwo = 0
            return paymentone, paymenttwo
    else:
        paymentone = 0
        paymenttwo = 0
        return paymentone, paymenttwo

main code

cleandata["year3pay1", "year3pay2"] = cleandata.apply(semiannualpayments,
                                                      args=("owneroccupancycode", "year3total", "county"), axis=1,
                                                      result_type='expand')

I am now getting the following error:

Error:Traceback (most recent call last):
  File "C:\GitHub\PropertyTax\tanyardsprings.py", line 230, in <module>
    cleandata["year3pay1", "year3pay2"] = cleandata.apply(semiannualpayments,
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\frame.py", line 9568, in apply
    return op.apply().__finalize__(self, method="apply")
           ^^^^^^^^^^
  File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\apply.py", line 764, in apply
    return self.apply_standard()
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\apply.py", line 891, in apply_standard
    results, res_index = self.apply_series_generator()
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\apply.py", line 907, in apply_series_generator
    results[i] = self.f(v)
                 ^^^^^^^^^
  File "C:\GitHub\PropertyTax\venv\Lib\site-packages\pandas\core\apply.py", line 142, in f
    return func(x, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: semiannualpayments() takes 3 positional arguments but 4 were given

Process finished with exit code 1

**deanhystad** · (This post was last modified: Apr-09-2023, 10:54 PM by deanhystad.)

You are missing important details about how arguments get passed to the function. I'll use this simple example to poke around and see what arguments are passed to the function when using apply() with the "args" keyword..

import pandas as pd

def function(*args):
    print(*map(type, args))  # Print arg types
    return 1, 2

df = pd.DataFrame({"A": [1], "B": [3], "C": [5]})
df[["D", "E"]] = df.apply(function, args=("A", "B"), axis=1, result_type='expand')

Output:
<class 'pandas.core.series.Series'> <class 'str'> <class 'str'>

The first argument passed to the function is a series. The next two arguments are strings. Let's see what the values are.

import pandas as pd

def function(*args):
    print(*args)  # print arg values
    return 1, 2

df = pd.DataFrame({"A": [1], "B": [3], "C": [5]})
df[["D", "E"]] = df.apply(function, args=("A", "B"), axis=1, result_type='expand')

Output:A    1
B    3
C    5
Name: 0, dtype: int64 A B

The first argument is a series. In this case it is the dataframe "df". The next two arguments are strings "A" and "B". The series is passed as the first argument to the function because that's what apply does. When you "apply" a function to a series, the series is passed as the first argument to the function. The strings "A" and "B" are passed because I have "args=("A", "B") as an argument to the "apply" function.

Now look at your code. This is how you call the function.

cleandata["year3pay1", "year3pay2"] = cleandata.apply(semiannualpayments,
                                                      args=("owneroccupancycode", "year3total", "county"), axis=1,
                                                      result_type='expand')

The first argument passed to your function is the series "cleandata". You also pass three string arguments; "owneroccupancycode", "year3total" and "county".

Look at your function:

def semiannualpayments (owneroccupied, totalpayment, county):

The first problem is that your function takes three arguments, but you pass four: series, str, str, str.

Looking deeper into your function we see additional problems:

    if (county == "ANNE"):
        if (owneroccupied == "Yes"):

It appears that you think country might be "ANNE" and that owneroccupied might be "Yes". This will never happen. owneroccupied is the first argument. It will be the series cleandata. country is the third argument. It will be the string "year3total". You also do some math with totalpayment, but totalpayment is the second agrument, and will always be the string "owneroccupancycode".

Looking at your function, I would solve like this:

# Get the columns you want to pass to the function
columns = cleandata[["owneroccupancycode", "year3total", "country"]]
# Apply the function
cleandata[["year3pay1", "year3pay2"]] = columns.apply(semiannualpayments, axis=1, result_type='expand')

And the function looks like this:

def semiannualpayments (row):
    owneroccupied, totalpayment, county = row  # Unpack the values from the row
    from interest import anneinterest   # Put all imports at the top of the module.  anneinterest not used
    if (county == "ANNE"):
        if (owneroccupied == "Yes"):
            paymentone = totalpayment / 2
            paymenttwo = totalpayment - paymentone
            paymenttwo = (paymenttwo * (interestrate(county))) + paymenttwo
        else:
            paymentone = totalpayment
            paymenttwo = 0
    else:
        paymentone = 0
        paymenttwo = 0
    return paymentone, paymenttwo

Or you could pass the column index names as arguments and use these in the function to extract values from the row.

def semiannual_payments (row, occupied, payment, country):
    occupied = row[occupied]  # Extract values from row
    payment = row[payment]
    country = row[country]

    # Compute payments
    a = b = 0
    if (county == "ANNE"):
        if (occupied == "Yes"):
            a = payment / 2
            b = (payment - a) * (interest_rate(country) + 1)
        else:
            a = payment
    return a, b


cleandata[["year3pay1", "year3pay2"]] = cleandata.apply(
    semiannual_payments,
    args=("owneroccupancycode", "year3total", "country"),
    axis=1,
    result_type='expand')

Notice that the additional arguments appear after the argument for the series (row). Remember, the extra arguments are strings, not the actual columns. Use indexing to extract the values from the row,

mbrown009 · Apr-09-2023, 10:48 PM

I get the following error

Error:
TypeError: semiannualpayments() takes 1 positional argument but 3 were given

**deanhystad** · (This post was last modified: Apr-09-2023, 11:10 PM by deanhystad.)

You get that error for what? Where is the code that results in this error? Don't post error messages without code. When posting errors, post the entire error message, including the trace.

If this is the function def:

def semiannualpayments (row):

You do not pass additional arguments using "args=".

mbrown009 · (This post was last modified: Apr-10-2023, 01:57 AM by mbrown009.)

(Apr-09-2023, 11:10 PM)deanhystad Wrote: You get that error for what? Where is the code that results in this error? Don't post error messages without code. When posting errors, post the entire error message, including the trace.

If this is the function def:
def semiannualpayments (row):
You do not pass additional arguments using "args=".

My apologizes

I just edited this post. I got it working I forgot to pull something out of the code.

**deanhystad** · (This post was last modified: Apr-10-2023, 02:08 AM by deanhystad.)

Error says your dataframe does not have a column named "country". Should be "county". Typo in my code. Still, it is something you should have spotted right away.

mbrown009 · Apr-10-2023, 02:06 AM

(Apr-10-2023, 02:03 AM)deanhystad Wrote: Error says your dataframe does not have a column named "country". Do yoy mean "county"? That is not than the error you posted earlier.

Yes the earlier error I found out that I forgot to take something out that needed to be taken out with the updated function that you shared. Then the next error was something because it should be county vs country.

Once I fixed that all worked.

Thank you for your assistance

AdamHensley · Jun-20-2024, 07:59 PM

It looks like the issue might be due to the way you're trying to access the values in the year3paytest column. Instead of directly using [0][0], try iterating through the rows or using apply again to split the values. # Assuming semiannualpayments returns a tuple like (1500, 1505) cleandata["year3paytest"] = cleandata.apply(lambda x: semiannualpayments(x["owneroccupancycode"], x["year3total"], x["county"]), axis=1) # Split the tuples into two separate columns cleandata["year3pay1"] = cleandata["year3paytest"].apply(lambda x: x[0]) cleandata["year3pay2"] = cleandata["year3paytest"].apply(lambda x: x[1]) This should help you split the tuple values into two separate columns without running into the KeyError. Give it a try and see if it resolves the issue!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Find duplicates in a pandas dataframe list column on other rows	Calab	2	2,421	Sep-18-2024, 07:38 PM Last Post: Calab
	Find strings by index from a list of indexes in a different Pandas dataframe column	Calab	3	1,740	Aug-26-2024, 04:52 PM Last Post: Calab
	Create new column in dataframe	Scott	10	3,861	Jun-30-2024, 10:18 PM Last Post: Scott
	Putting column name to dataframe, can't work.	jonah88888	2	3,359	Jun-18-2024, 09:19 PM Last Post: AdamHensley
	concat 3 columns of dataframe to one column	flash77	2	2,245	Oct-03-2023, 09:29 PM Last Post: flash77
	HTML Decoder pandas dataframe column	mbrown009	3	2,838	Sep-29-2023, 05:56 PM Last Post: deanhystad
	Increase df column values decimals	SriRajesh	2	1,994	Nov-14-2022, 05:20 PM Last Post: deanhystad
	New Dataframe Column Based on Several Conditions	nb1214	1	2,612	Nov-16-2021, 10:52 PM Last Post: jefsummers
	pandas: Compute the % of the unique values in a column	JaneTan	1	2,501	Oct-25-2021, 07:55 PM Last Post: jefsummers
	Remove specific values from dataframe	jonah88888	0	2,323	Sep-24-2021, 05:09 AM Last Post: jonah88888

attempt to split values from within a dataframe column

User Panel Messages

Announcements