Python Forum
How to most effectively unpack list of name-value pair dictionaries in a dataframe? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: How to most effectively unpack list of name-value pair dictionaries in a dataframe? (/thread-41084.html)



How to most effectively unpack list of name-value pair dictionaries in a dataframe? - zlim - Nov-07-2023

Hello! I'm working with a dataset that has a rather inconvenient format where one of the columns is basically a list of name-value pair dictionaries. I would like to expand that column such that each of the names is it's own column. So far, I've found a way to do it by manually extracting each of the values, but ideally, I would prefer a more general solution that is also efficient. Here's an example:

import pandas as pd

data = {'name': ['Alice', 'Bob', 'Clark'],
        'preferences': [[{'name': 'fruit', 'value': 'apple'}, 
                         {'name': 'drink', 'value': 'lemonade'},
                         {'name': 'food', 'value': 'pizza'}],
                        [{'name': 'fruit', 'value': 'orange'}, 
                         {'name': 'drink', 'value': 'soda'},
                         {'name': 'food', 'value': 'soup'}],
                        [{'name': 'fruit', 'value': 'pear'}, 
                         {'name': 'drink', 'value': 'water'},
                         {'name': 'food', 'value': 'chicken'}]]}

df = pd.DataFrame(data)

# Extract values from 'preferences' column
df['fruit'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'fruit'][0])
df['drink'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'drink'][0])
df['food'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'food'][0])

# Drop the 'preferences' column
df = df.drop(columns=['preferences'])
An additional complication is that not every column has the same name-value pairs. In that case, the method above fails (IndexError) without doing an additional check, which is even more inefficient.

Maybe the solution is to use pd.json_normalize on the preferences column, pivot that, then append the various dataframes?


RE: How to most effectively unpack list of name-value pair dictionaries in a dataframe? - zlim - Nov-07-2023

Updated code to account for potential gaps:

import pandas as pd
import numpy as np

def extract_value(l, name):
    extracted = [item['value'] for item in l if item['name'] == name]
    if len(extracted) == 0:
        return np.nan
    else:
        return extracted[0]

data = {'name': ['Alice', 'Bob', 'Clark'],
        'preferences': [[{'name': 'fruit', 'value': 'apple'}, 
                         {'name': 'drink', 'value': 'lemonade'},
                         {'name': 'food', 'value': 'pizza'}],
                        [{'name': 'fruit', 'value': 'orange'}, 
                         {'name': 'drink', 'value': 'soda'},
                         {'name': 'food', 'value': 'soup'}],
                        [{'name': 'fruit', 'value': 'pear'}, 
                         {'name': 'food', 'value': 'chicken'}]]}
 
df = pd.DataFrame(data)
 
# Extract values from 'preferences' column
df['fruit'] = df['preferences'].apply(lambda x: extract_value(x, 'fruit'))
df['drink'] = df['preferences'].apply(lambda x: extract_value(x, 'drink'))
df['food'] = df['preferences'].apply(lambda x: extract_value(x, 'food'))
 
# Drop the 'preferences' column
df = df.drop(columns=['preferences'])