Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Adding PD DataFrame column
#1
Moring All.

I'm a bit stuck when adding a new column to a DF. This can get a little confusing, but will try to keep it simple.

I'm working with a heavily nested API result set that I'm converting to json in order to extract correct parts from the result set. But I'm struggling to add proper values when the length of values does not match the length of the index. Every record beyond the length of values (pulled from the result set) is NaN.

Data: My data (result) set has 116 "line Item ID's", but each line item can have multiple shipment ID's (217 in this example), i.e.
Output:
{ "metadata": { "startIndex": 1, "pageSize": null, "totalItems": 116 }, "lineItem": { "lineItemType": "FIXTURE", "lineItemId": 1017295, "type": "S4" }, "lineItemShipment": [ { "id": 543515, "quantity": 4, "estimatedDate": "2023-09-19", "statusString": "Shipment", "updateKey": "44D7D5DAC7647C3B7B240E137ECEB226", "suppress": true }, { "id": 543519, "quantity": 0, "estimatedDate": null, "statusString": "In Stock", "updateKey": "0ECEF3163662703B77920826C3B2D7ED", "suppress": true } ] }
What I'm wanting to add "lineItemId": 1017295 to each array in lineItemShipment. The column is added correctly, but only the first shipment record gets an ItemId, the 2nd gets NaN.

Code snippet
jm_line_req = requests.get(jm_line_url, headers=jmHeader)

jmLine_df = pd.DataFrame(jm_line_req.json()['lineItem']

lineItemShipment = pd.json_normalize(jm_json, record_path=['lineItem', 'lineItemShipment']) (extracting the shippment data)

'adding new column to lineItemShipment'
lineItemShipment['lineItemId'] = jmLine_df['lineItemId']
Any help is greatly appreciated!

Please let me know if I need to provide additional information/code.

Thanks.
Bsben
Reply
#2
Workable solution, maybe not ideal or quickest, but iterating through rows is working.

def add_line_item_id_to_shipments(data):
    # Extract relevant information
    for index, row in data.iterrows():
        line_item_id = data["lineItemId"][index]
        shipments = data["lineItemShipment"][index]

    # Add lineItemId to each shipment
        for shipment in shipments:
            shipment["lineItemId"] = line_item_id

    return data
was really hoping could just add the column in the Dataframe, but ran out of ideas...specially since I'm not strong in pd Dataframes.
buran write Mar-08-2024, 05:50 PM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.
Reply
#3
Are your trying to do this?
import pandas as pd
import json


data = json.loads("""{
    "metadata": {
        "startIndex": 1,
        "pageSize": null,
        "totalItems": 116
    },
    "lineItem": {
        "lineItemType": "FIXTURE",
        "lineItemId": 1017295,
        "type": "S4",
        "lineItemShipment": [
            {
                "id": 543515,
                "quantity": 4,
                "estimatedDate": "2023-09-19",
                "statusString": "Shipment",
                "updateKey": "44D7D5DAC7647C3B7B240E137ECEB226",
                "suppress": true
            },
            {
                "id": 543519,
                "quantity": 0,
                "estimatedDate": null,
                "statusString": "In Stock",
                "updateKey": "0ECEF3163662703B77920826C3B2D7ED",
                "suppress": true
            }
        ]
    }
}""")
df = pd.DataFrame(data["lineItem"])
df_fixed = df.join(pd.json_normalize(df.lineItemShipment)).drop('lineItemShipment', axis='columns')
print(df_fixed)
Output:
lineItemType lineItemId type id quantity estimatedDate statusString updateKey suppress 0 FIXTURE 1017295 S4 543515 4 2023-09-19 Shipment 44D7D5DAC7647C3B7B240E137ECEB226 True 1 FIXTURE 1017295 S4 543519 0 None In Stock 0ECEF3163662703B77920826C3B2D7ED True
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python Alteryx QS-Passing pandas dataframe column inside SQL query where condition sanky1990 0 747 Dec-04-2023, 09:48 PM
Last Post: sanky1990
  Difference one column in a dataframe Scott 0 645 Feb-10-2023, 08:41 AM
Last Post: Scott
  splitting a Dataframe Column in two parts nafshar 2 961 Jan-30-2023, 01:19 PM
Last Post: nafshar
  How to assign a value to pandas dataframe column rows based on a condition klllmmm 0 849 Sep-08-2022, 06:32 AM
Last Post: klllmmm
  renaming the 0 column in a dataframe Led_Zeppelin 5 1,563 Aug-16-2022, 04:07 PM
Last Post: deanhystad
  Copy a column from one dataframe to another dataframe Led_Zeppelin 17 11,458 Jul-08-2022, 08:40 PM
Last Post: deanhystad
  Cannot convert the series to <class 'int'> when trying to create new dataframe column Mark17 3 8,547 Jan-20-2022, 05:15 PM
Last Post: deanhystad
  Filter dataframe by datetime.date column glidecode 2 5,182 Dec-05-2021, 12:51 AM
Last Post: glidecode
  Adding another condition for df column comparison Mark17 2 1,675 Sep-30-2021, 03:54 PM
Last Post: Mark17
  Pandas DataFrame combine rows by column value, where Date Rows are NULL rhat398 0 2,124 May-04-2021, 10:51 PM
Last Post: rhat398

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020