Python Forum

Full Version: Adding PD DataFrame column
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Moring All.

I'm a bit stuck when adding a new column to a DF. This can get a little confusing, but will try to keep it simple.

I'm working with a heavily nested API result set that I'm converting to json in order to extract correct parts from the result set. But I'm struggling to add proper values when the length of values does not match the length of the index. Every record beyond the length of values (pulled from the result set) is NaN.

Data: My data (result) set has 116 "line Item ID's", but each line item can have multiple shipment ID's (217 in this example), i.e.
Output:
{ "metadata": { "startIndex": 1, "pageSize": null, "totalItems": 116 }, "lineItem": { "lineItemType": "FIXTURE", "lineItemId": 1017295, "type": "S4" }, "lineItemShipment": [ { "id": 543515, "quantity": 4, "estimatedDate": "2023-09-19", "statusString": "Shipment", "updateKey": "44D7D5DAC7647C3B7B240E137ECEB226", "suppress": true }, { "id": 543519, "quantity": 0, "estimatedDate": null, "statusString": "In Stock", "updateKey": "0ECEF3163662703B77920826C3B2D7ED", "suppress": true } ] }
What I'm wanting to add "lineItemId": 1017295 to each array in lineItemShipment. The column is added correctly, but only the first shipment record gets an ItemId, the 2nd gets NaN.

Code snippet
jm_line_req = requests.get(jm_line_url, headers=jmHeader)

jmLine_df = pd.DataFrame(jm_line_req.json()['lineItem']

lineItemShipment = pd.json_normalize(jm_json, record_path=['lineItem', 'lineItemShipment']) (extracting the shippment data)

'adding new column to lineItemShipment'
lineItemShipment['lineItemId'] = jmLine_df['lineItemId']
Any help is greatly appreciated!

Please let me know if I need to provide additional information/code.

Thanks.
Bsben
Workable solution, maybe not ideal or quickest, but iterating through rows is working.

def add_line_item_id_to_shipments(data):
    # Extract relevant information
    for index, row in data.iterrows():
        line_item_id = data["lineItemId"][index]
        shipments = data["lineItemShipment"][index]

    # Add lineItemId to each shipment
        for shipment in shipments:
            shipment["lineItemId"] = line_item_id

    return data
was really hoping could just add the column in the Dataframe, but ran out of ideas...specially since I'm not strong in pd Dataframes.
Are your trying to do this?
import pandas as pd
import json


data = json.loads("""{
    "metadata": {
        "startIndex": 1,
        "pageSize": null,
        "totalItems": 116
    },
    "lineItem": {
        "lineItemType": "FIXTURE",
        "lineItemId": 1017295,
        "type": "S4",
        "lineItemShipment": [
            {
                "id": 543515,
                "quantity": 4,
                "estimatedDate": "2023-09-19",
                "statusString": "Shipment",
                "updateKey": "44D7D5DAC7647C3B7B240E137ECEB226",
                "suppress": true
            },
            {
                "id": 543519,
                "quantity": 0,
                "estimatedDate": null,
                "statusString": "In Stock",
                "updateKey": "0ECEF3163662703B77920826C3B2D7ED",
                "suppress": true
            }
        ]
    }
}""")
df = pd.DataFrame(data["lineItem"])
df_fixed = df.join(pd.json_normalize(df.lineItemShipment)).drop('lineItemShipment', axis='columns')
print(df_fixed)
Output:
lineItemType lineItemId type id quantity estimatedDate statusString updateKey suppress 0 FIXTURE 1017295 S4 543515 4 2023-09-19 Shipment 44D7D5DAC7647C3B7B240E137ECEB226 True 1 FIXTURE 1017295 S4 543519 0 None In Stock 0ECEF3163662703B77920826C3B2D7ED True