Adding PD DataFrame column

bsben · (This post was last modified: Mar-08-2024, 04:36 PM by deanhystad.)

Moring All.

I'm a bit stuck when adding a new column to a DF. This can get a little confusing, but will try to keep it simple.

I'm working with a heavily nested API result set that I'm converting to json in order to extract correct parts from the result set. But I'm struggling to add proper values when the length of values does not match the length of the index. Every record beyond the length of values (pulled from the result set) is NaN.

Data: My data (result) set has 116 "line Item ID's", but each line item can have multiple shipment ID's (217 in this example), i.e.

Output: {
  "metadata": {
    "startIndex": 1,
    "pageSize": null,
    "totalItems": 116
  },
  "lineItem": 
     {
      "lineItemType": "FIXTURE",
      "lineItemId": 1017295,
      "type": "S4"
	  },
 	"lineItemShipment": [
        {
          "id": 543515,
          "quantity": 4,
          "estimatedDate": "2023-09-19",
          "statusString": "Shipment",
          "updateKey": "44D7D5DAC7647C3B7B240E137ECEB226",
          "suppress": true
        },
        {
          "id": 543519,
          "quantity": 0,
          "estimatedDate": null,
          "statusString": "In Stock",
          "updateKey": "0ECEF3163662703B77920826C3B2D7ED",
          "suppress": true
        }
      ] 
  }

What I'm wanting to add "lineItemId": 1017295 to each array in lineItemShipment. The column is added correctly, but only the first shipment record gets an ItemId, the 2nd gets NaN.

Code snippet

jm_line_req = requests.get(jm_line_url, headers=jmHeader)

jmLine_df = pd.DataFrame(jm_line_req.json()['lineItem']

lineItemShipment = pd.json_normalize(jm_json, record_path=['lineItem', 'lineItemShipment']) (extracting the shippment data)

'adding new column to lineItemShipment'
lineItemShipment['lineItemId'] = jmLine_df['lineItemId']

Any help is greatly appreciated!

Please let me know if I need to provide additional information/code.

Thanks.
Bsben

bsben · (This post was last modified: Mar-08-2024, 05:50 PM by buran.)

Workable solution, maybe not ideal or quickest, but iterating through rows is working.

def add_line_item_id_to_shipments(data):
    # Extract relevant information
    for index, row in data.iterrows():
        line_item_id = data["lineItemId"][index]
        shipments = data["lineItemShipment"][index]

    # Add lineItemId to each shipment
        for shipment in shipments:
            shipment["lineItemId"] = line_item_id

    return data

was really hoping could just add the column in the Dataframe, but ran out of ideas...specially since I'm not strong in pd Dataframes.

buran write Mar-08-2024, 05:50 PM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.

**deanhystad** · Mar-08-2024, 10:46 PM

Are your trying to do this?

import pandas as pd
import json


data = json.loads("""{
    "metadata": {
        "startIndex": 1,
        "pageSize": null,
        "totalItems": 116
    },
    "lineItem": {
        "lineItemType": "FIXTURE",
        "lineItemId": 1017295,
        "type": "S4",
        "lineItemShipment": [
            {
                "id": 543515,
                "quantity": 4,
                "estimatedDate": "2023-09-19",
                "statusString": "Shipment",
                "updateKey": "44D7D5DAC7647C3B7B240E137ECEB226",
                "suppress": true
            },
            {
                "id": 543519,
                "quantity": 0,
                "estimatedDate": null,
                "statusString": "In Stock",
                "updateKey": "0ECEF3163662703B77920826C3B2D7ED",
                "suppress": true
            }
        ]
    }
}""")
df = pd.DataFrame(data["lineItem"])
df_fixed = df.join(pd.json_normalize(df.lineItemShipment)).drop('lineItemShipment', axis='columns')
print(df_fixed)

Output:  lineItemType  lineItemId type      id  quantity estimatedDate statusString                         updateKey  suppress
0      FIXTURE     1017295   S4  543515         4    2023-09-19     Shipment  44D7D5DAC7647C3B7B240E137ECEB226      True
1      FIXTURE     1017295   S4  543519         0          None     In Stock  0ECEF3163662703B77920826C3B2D7ED      True

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	renaming a column without a name in a dataframe	Carbonpony	2	811	Jan-23-2025, 08:20 AM Last Post: Carbonpony
	adding a calculated column	charles986	1	872	Jun-13-2024, 05:31 PM Last Post: deanhystad
	adding a calculated column	charles986	1	675	Jun-13-2024, 02:27 PM Last Post: deanhystad
	Python Alteryx QS-Passing pandas dataframe column inside SQL query where condition	sanky1990	0	1,346	Dec-04-2023, 09:48 PM Last Post: sanky1990
	Difference one column in a dataframe	Scott	0	1,148	Feb-10-2023, 08:41 AM Last Post: Scott
	splitting a Dataframe Column in two parts	nafshar	2	1,769	Jan-30-2023, 01:19 PM Last Post: nafshar
	How to assign a value to pandas dataframe column rows based on a condition	klllmmm	0	1,846	Sep-08-2022, 06:32 AM Last Post: klllmmm
	renaming the 0 column in a dataframe	Led_Zeppelin	5	6,597	Aug-16-2022, 04:07 PM Last Post: deanhystad
	Copy a column from one dataframe to another dataframe	Led_Zeppelin	17	19,727	Jul-08-2022, 08:40 PM Last Post: deanhystad
	Cannot convert the series to <class 'int'> when trying to create new dataframe column	Mark17	3	10,296	Jan-20-2022, 05:15 PM Last Post: deanhystad

Adding PD DataFrame column

User Panel Messages

Announcements