Python Forum

Moring All.

I'm a bit stuck when adding a new column to a DF. This can get a little confusing, but will try to keep it simple.

I'm working with a heavily nested API result set that I'm converting to json in order to extract correct parts from the result set. But I'm struggling to add proper values when the length of values does not match the length of the index. Every record beyond the length of values (pulled from the result set) is NaN.

Data: My data (result) set has 116 "line Item ID's", but each line item can have multiple shipment ID's (217 in this example), i.e.

Output: {
  "metadata": {
    "startIndex": 1,
    "pageSize": null,
    "totalItems": 116
  },
  "lineItem": 
     {
      "lineItemType": "FIXTURE",
      "lineItemId": 1017295,
      "type": "S4"
	  },
 	"lineItemShipment": [
        {
          "id": 543515,
          "quantity": 4,
          "estimatedDate": "2023-09-19",
          "statusString": "Shipment",
          "updateKey": "44D7D5DAC7647C3B7B240E137ECEB226",
          "suppress": true
        },
        {
          "id": 543519,
          "quantity": 0,
          "estimatedDate": null,
          "statusString": "In Stock",
          "updateKey": "0ECEF3163662703B77920826C3B2D7ED",
          "suppress": true
        }
      ] 
  }

What I'm wanting to add "lineItemId": 1017295 to each array in lineItemShipment. The column is added correctly, but only the first shipment record gets an ItemId, the 2nd gets NaN.

Code snippet

jm_line_req = requests.get(jm_line_url, headers=jmHeader)

jmLine_df = pd.DataFrame(jm_line_req.json()['lineItem']

lineItemShipment = pd.json_normalize(jm_json, record_path=['lineItem', 'lineItemShipment']) (extracting the shippment data)

'adding new column to lineItemShipment'
lineItemShipment['lineItemId'] = jmLine_df['lineItemId']

Any help is greatly appreciated!

Please let me know if I need to provide additional information/code.

Thanks.
Bsben

Workable solution, maybe not ideal or quickest, but iterating through rows is working.

def add_line_item_id_to_shipments(data):
    # Extract relevant information
    for index, row in data.iterrows():
        line_item_id = data["lineItemId"][index]
        shipments = data["lineItemShipment"][index]

    # Add lineItemId to each shipment
        for shipment in shipments:
            shipment["lineItemId"] = line_item_id

    return data

was really hoping could just add the column in the Dataframe, but ran out of ideas...specially since I'm not strong in pd Dataframes.

Are your trying to do this?

import pandas as pd
import json


data = json.loads("""{
    "metadata": {
        "startIndex": 1,
        "pageSize": null,
        "totalItems": 116
    },
    "lineItem": {
        "lineItemType": "FIXTURE",
        "lineItemId": 1017295,
        "type": "S4",
        "lineItemShipment": [
            {
                "id": 543515,
                "quantity": 4,
                "estimatedDate": "2023-09-19",
                "statusString": "Shipment",
                "updateKey": "44D7D5DAC7647C3B7B240E137ECEB226",
                "suppress": true
            },
            {
                "id": 543519,
                "quantity": 0,
                "estimatedDate": null,
                "statusString": "In Stock",
                "updateKey": "0ECEF3163662703B77920826C3B2D7ED",
                "suppress": true
            }
        ]
    }
}""")
df = pd.DataFrame(data["lineItem"])
df_fixed = df.join(pd.json_normalize(df.lineItemShipment)).drop('lineItemShipment', axis='columns')
print(df_fixed)

Output:  lineItemType  lineItemId type      id  quantity estimatedDate statusString                         updateKey  suppress
0      FIXTURE     1017295   S4  543515         4    2023-09-19     Shipment  44D7D5DAC7647C3B7B240E137ECEB226      True
1      FIXTURE     1017295   S4  543519         0          None     In Stock  0ECEF3163662703B77920826C3B2D7ED      True

bsben

bsben

deanhystad