Python Forum
Calculated DF column from dictionary value
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Calculated DF column from dictionary value
#1
I would like to make a calculated column as shown here ('new'), from the exiting 'data' column (which is a list with a dictionary inside). It works in this code.

data = [10,[{'self': 'https://elia.atlassian.net/rest/api/3/customFieldOption/10200', 'value': 'IT-Sourced Changes 2022', 'id': '10200'}],30]
df = pd.DataFrame(data, columns=['Data'])
df['new'] = df.Data.explode().str['value']
df.head(3)
However, when I try it on an existing dataframe, I get 'ValueError: cannot reindex from a duplicate axis'. Not sure why.

https://imgur.com/a/B4qEOWa
Reply
#2
Explode does this:
import pandas as pd

df = pd.DataFrame({"Data": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]})
print(df)
print(df.Data.explode())
Output:
Data 0 [1, 2, 3] 1 [4, 5, 6] 2 [7, 8, 9] 0 1 0 2 0 3 1 4 1 5 1 6 2 7 2 8 2 9
Notice all the duplicate index values generated by explode(). When I try to add this as a column to an existing dataframe I get the same error you are seeing.

What happens if I reset the index to count up from zero? That will get rid of the duplicate index values.
import pandas as pd

df = pd.DataFrame({"Data": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]})
series = df.Data.explode().reset_index(drop=True)
print(series)
df["Explode"] = series
print(df)
Output:
0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 Name: Data, dtype: object Data Explode 0 [1, 2, 3] 1 1 [4, 5, 6] 2 2 [7, 8, 9] 3
This works! But why does the index matter?
import pandas as pd

df = pd.DataFrame({"Data": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]})
series = df.Data.explode().reset_index(drop=True)
df = df[1:]
print(df)
df["Explode"] = series
print(df)
Output:
Data 1 [4, 5, 6] 2 [7, 8, 9] Data Explode 1 [4, 5, 6] 2 2 [7, 8, 9] 3
When adding a series to an existing dataframe, pandas uses the index values to merge in the new values. Notice that series starts at 1, but when added to the dataframe it starts at 2. This is because the first index in df is 1. and series.iloc[1] == 2.

So you are getting an error because pandas does not know what to do with the duplicate index values created by explode(). That makes sense. How else would you collate other than using the row index values?
plantagenet likes this post
Reply
#3
First time I use this forum, but that not only worked - it's the best explanation I've ever gotten to fix a problem. Thank you so much.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to calculated how many fail in each site(s) in csv files SamLiu 4 1,251 Sep-26-2022, 06:28 AM
Last Post: SamLiu
  Can I format decimal places by column with a dictionary? Mark17 2 2,514 Dec-28-2020, 10:13 PM
Last Post: Mark17
  Using OpenPyXL How To Read Entire Column Into Dictionary jo15765 1 2,645 Jun-08-2020, 04:10 AM
Last Post: buran
  How do I print a returned variable calculated in another function? RedSkeleton007 3 3,493 Jul-10-2018, 12:10 PM
Last Post: buran
  Sorting values calculated in python stumunro 4 3,901 Sep-13-2017, 06:09 AM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020