Calculated DF column from dictionary value

plantagenet · (This post was last modified: Sep-15-2022, 12:53 PM by plantagenet.)

I would like to make a calculated column as shown here ('new'), from the exiting 'data' column (which is a list with a dictionary inside). It works in this code.

data = [10,[{'self': 'https://elia.atlassian.net/rest/api/3/customFieldOption/10200', 'value': 'IT-Sourced Changes 2022', 'id': '10200'}],30]
df = pd.DataFrame(data, columns=['Data'])
df['new'] = df.Data.explode().str['value']
df.head(3)

However, when I try it on an existing dataframe, I get 'ValueError: cannot reindex from a duplicate axis'. Not sure why.

https://imgur.com/a/B4qEOWa

**deanhystad** · (This post was last modified: Sep-15-2022, 03:08 PM by deanhystad.)

Explode does this:

import pandas as pd

df = pd.DataFrame({"Data": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]})
print(df)
print(df.Data.explode())

Output:        Data
0  [1, 2, 3]
1  [4, 5, 6]
2  [7, 8, 9]
0    1
0    2
0    3
1    4
1    5
1    6
2    7
2    8
2    9

Notice all the duplicate index values generated by explode(). When I try to add this as a column to an existing dataframe I get the same error you are seeing.

What happens if I reset the index to count up from zero? That will get rid of the duplicate index values.

import pandas as pd

df = pd.DataFrame({"Data": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]})
series = df.Data.explode().reset_index(drop=True)
print(series)
df["Explode"] = series
print(df)

Output:0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
8    9
Name: Data, dtype: object
        Data Explode
0  [1, 2, 3]       1
1  [4, 5, 6]       2
2  [7, 8, 9]       3

This works! But why does the index matter?

import pandas as pd

df = pd.DataFrame({"Data": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]})
series = df.Data.explode().reset_index(drop=True)
df = df[1:]
print(df)
df["Explode"] = series
print(df)

Output:        Data
1  [4, 5, 6]
2  [7, 8, 9]
        Data Explode
1  [4, 5, 6]       2
2  [7, 8, 9]       3

When adding a series to an existing dataframe, pandas uses the index values to merge in the new values. Notice that series starts at 1, but when added to the dataframe it starts at 2. This is because the first index in df is 1. and series.iloc[1] == 2.

So you are getting an error because pandas does not know what to do with the duplicate index values created by explode(). That makes sense. How else would you collate other than using the row index values?

plantagenet · Sep-16-2022, 12:46 AM

First time I use this forum, but that not only worked - it's the best explanation I've ever gotten to fix a problem. Thank you so much.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	adding a calculated column	charles986	1	931	Jun-13-2024, 05:31 PM Last Post: deanhystad
	adding a calculated column	charles986	1	716	Jun-13-2024, 02:27 PM Last Post: deanhystad
	How to calculated how many fail in each site(s) in csv files	SamLiu	4	2,365	Sep-26-2022, 06:28 AM Last Post: SamLiu
	Can I format decimal places by column with a dictionary?	Mark17	2	3,832	Dec-28-2020, 10:13 PM Last Post: Mark17
	Using OpenPyXL How To Read Entire Column Into Dictionary	jo15765	1	3,398	Jun-08-2020, 04:10 AM Last Post: buran
	How do I print a returned variable calculated in another function?	RedSkeleton007	3	4,443	Jul-10-2018, 12:10 PM Last Post: buran
	Sorting values calculated in python	stumunro	4	5,085	Sep-13-2017, 06:09 AM Last Post: nilamo

Calculated DF column from dictionary value

User Panel Messages

Announcements