How to most effectively unpack list of name-value pair dictionaries in a dataframe? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: How to most effectively unpack list of name-value pair dictionaries in a dataframe? (/thread-41084.html) |
How to most effectively unpack list of name-value pair dictionaries in a dataframe? - zlim - Nov-07-2023 Hello! I'm working with a dataset that has a rather inconvenient format where one of the columns is basically a list of name-value pair dictionaries. I would like to expand that column such that each of the names is it's own column. So far, I've found a way to do it by manually extracting each of the values, but ideally, I would prefer a more general solution that is also efficient. Here's an example: import pandas as pd data = {'name': ['Alice', 'Bob', 'Clark'], 'preferences': [[{'name': 'fruit', 'value': 'apple'}, {'name': 'drink', 'value': 'lemonade'}, {'name': 'food', 'value': 'pizza'}], [{'name': 'fruit', 'value': 'orange'}, {'name': 'drink', 'value': 'soda'}, {'name': 'food', 'value': 'soup'}], [{'name': 'fruit', 'value': 'pear'}, {'name': 'drink', 'value': 'water'}, {'name': 'food', 'value': 'chicken'}]]} df = pd.DataFrame(data) # Extract values from 'preferences' column df['fruit'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'fruit'][0]) df['drink'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'drink'][0]) df['food'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'food'][0]) # Drop the 'preferences' column df = df.drop(columns=['preferences'])An additional complication is that not every column has the same name-value pairs. In that case, the method above fails (IndexError) without doing an additional check, which is even more inefficient. Maybe the solution is to use pd.json_normalize on the preferences column, pivot that, then append the various dataframes? RE: How to most effectively unpack list of name-value pair dictionaries in a dataframe? - zlim - Nov-07-2023 Updated code to account for potential gaps: import pandas as pd import numpy as np def extract_value(l, name): extracted = [item['value'] for item in l if item['name'] == name] if len(extracted) == 0: return np.nan else: return extracted[0] data = {'name': ['Alice', 'Bob', 'Clark'], 'preferences': [[{'name': 'fruit', 'value': 'apple'}, {'name': 'drink', 'value': 'lemonade'}, {'name': 'food', 'value': 'pizza'}], [{'name': 'fruit', 'value': 'orange'}, {'name': 'drink', 'value': 'soda'}, {'name': 'food', 'value': 'soup'}], [{'name': 'fruit', 'value': 'pear'}, {'name': 'food', 'value': 'chicken'}]]} df = pd.DataFrame(data) # Extract values from 'preferences' column df['fruit'] = df['preferences'].apply(lambda x: extract_value(x, 'fruit')) df['drink'] = df['preferences'].apply(lambda x: extract_value(x, 'drink')) df['food'] = df['preferences'].apply(lambda x: extract_value(x, 'food')) # Drop the 'preferences' column df = df.drop(columns=['preferences']) |