Parse Nested JSON String in Python

rwalde · Sep-02-2022, 06:34 AM

Hello All,
I am trying to parse complex JSON String from AWS Pricing JSON file and convert it into CSV file based on some conditions. I was able to achieve the parsing code implementation using for loops and pandas, but need more efficient way of implementing the code as the performance of the code is really not good. I wanted understand how we can parse complex JSON without using for loops or if we have any other efficient way to implement the code, I tried using JSON normalize in pandas also. I would really appreciate the inputs.
Below is the code snippet I have used -

        
          
          
              
              def create_all_ri_regions_pricing_file():
    with open(INDEX_JSON_1) as pricing:
        read_content = json.load(pricing)
        reserved_df = read_content["terms"]["Reserved"]
        df_final = pd.DataFrame()
        for key in reserved_df:
            for key2 in reserved_df[key]:
                df = pd.DataFrame(reserved_df[key][key2]).reset_index(drop=True)
                print("Data frame 0--> ", df.columns)
                df["LeaseContractLength"] = reserved_df[key][key2]["termAttributes"]["LeaseContractLength"]
                df["PurchaseOption"] = reserved_df[key][key2]["termAttributes"]["PurchaseOption"]
                df["OfferingClass"] = reserved_df[key][key2]["termAttributes"]["OfferingClass"]
                df = df.drop(columns=["termAttributes"])
                df = pd.concat([df["priceDimensions"].apply(pd.Series), df.drop("priceDimensions", axis=1)], axis=1)
                df = df[
                    [
                        "offerTermCode",
                        "sku",
                        "LeaseContractLength",
                        "OfferingClass",
                        "PurchaseOption",
                        "rateCode",
                        "description",
                        "unit",
                        "pricePerUnit",
                    ]
                ]
                print("Data frame --> ", df)
                df_final = pd.concat([df_final, df]).reset_index(drop=True)
        df_final = df_final.loc[
            (df_final["LeaseContractLength"] == "1yr")
            & (df_final["OfferingClass"] == "standard")
            & (df_final["PurchaseOption"] == "All Upfront")
            & (df_final["unit"] == "Hrs")
        ]
        # print("final Data frame --> ", df_final)
        df_final.to_csv("./final.csv")

            

        
      

I have attached the JSON file to be parsed and expected CSV file for the reference.

**Larz60+** · (This post was last modified: Sep-02-2022, 03:28 PM by Larz60+.)

None of the items in the index sample seem to match up with the expectedCSVFile.csv contents.
If I am wrong, let me know, otherwise please re-post matched samples.

New Edit:
Also, I see multiple 'priceDimensions' in the json file.
Do you expect a separate entry in the csv file for each (as I would expect)?

Another edit:
I also see that pricePerUnit could potentially contain many values.
This leads me to the question, why do you want to move the data to csv file, when the json data is much more navigable?

**deanhystad** · (This post was last modified: Sep-02-2022, 05:27 PM by deanhystad.)

Oops. Replied to wrong post.

**Larz60+** · Sep-02-2022, 06:39 PM

The following code will give you a CSV file with each entry in the json file normalized into one or more csv records.
The json file would be much easier to use.
I love pandas, but think it's an additional layer that is not necessary in this instance, so didn't use it.
the jason file used was the sample file you posted.

        
          
          
              
              import os
from pathlib import Path
from copy import deepcopy
import csv
import json
import sys
 
 
class parse_index:
    def __init__(self):
        # files are in same path as script, assure path here.
        os.chdir(os.path.abspath(os.path.dirname(__file__)))
 
        filepath = Path('.')
        jsonfile = filepath / 'index.json'
        with  jsonfile.open() as fp:            
            self.jindex = json.load(fp)
        self.csvfile = filepath / 'newcsvfile.csv'
 
    def create_csv(self):
        header = ["item no", "offerTermCode", "sku", "LeaseContractLength",
                    "OfferingClass", "PurchaseOption", "rateCode",
                    "description", "unit", "pricePerUnit"]
        base = self.jindex["terms"]["Reserved"]
        with self.csvfile.open('w') as cfptr:
            cwrite = csv.writer(cfptr, delimiter=',').writerow
            cwrite(header)
 
            seqno = 1
            for key in base.keys():
                # print(f"key: {key}")
                bk = base[key]
                for key1 in bk.keys():
                    bk1 = bk[key1]
                 
                    buffer1 = []
                    buffer1.append(bk1['offerTermCode'])
                    buffer1.append(bk1['sku'])
                    buffer1.append(bk1['termAttributes']['LeaseContractLength'])
                    buffer1.append(bk1['termAttributes']['OfferingClass'])
                    buffer1.append(bk1['termAttributes']['PurchaseOption'])
 
                    # dynamic information
                    for key2 in bk1['priceDimensions'].keys():
                        buffer2 = deepcopy(buffer1)
                        bk2 = bk1['priceDimensions'][key2]
                        buffer2.append(bk2["rateCode"])
                        buffer2.append(bk2["description"])
                        unit = bk2['unit']
                        bk3 = bk2['pricePerUnit']
                        for key3, value in bk3.items():
                            buffer3 = [seqno] + deepcopy(buffer2)
                            buffer3.append(f"{key3} {value}")
                            # print(f"buffer3: {buffer3}")
                            cwrite(buffer3)
                            seqno += 1
 
 
def main():
    pi = parse_index()
    pi.create_csv()
 
 
if __name__ == '__main__':
    main()

            

        
      

csv file (newcsvfile.csv) contents:

Output:item no,offerTermCode,sku,LeaseContractLength,OfferingClass,PurchaseOption,rateCode,description,unit,pricePerUnit
1,MZU6U2429S,WS5A8EPNGYV4ZCR3,3yr,convertible,All Upfront,WS5A8EPNGYV4ZCR3.MZU6U2429S.2TG2D8R56U,Upfront Fee,USD 89879
2,MZU6U2429S,WS5A8EPNGYV4ZCR3,3yr,convertible,All Upfront,WS5A8EPNGYV4ZCR3.MZU6U2429S.6YS6EN2CT7,"USD 0.0 per Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 0.0000000000
3,4NA7Y494T4,WS5A8EPNGYV4ZCR3,1yr,standard,No Upfront,WS5A8EPNGYV4ZCR3.4NA7Y494T4.6YS6EN2CT7,"Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 3.8897400000
4,R5XV2EPZQZ,WS5A8EPNGYV4ZCR3,3yr,convertible,Partial Upfront,WS5A8EPNGYV4ZCR3.R5XV2EPZQZ.2TG2D8R56U,Upfront Fee,USD 45265
5,R5XV2EPZQZ,WS5A8EPNGYV4ZCR3,3yr,convertible,Partial Upfront,WS5A8EPNGYV4ZCR3.R5XV2EPZQZ.6YS6EN2CT7,"Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 1.7224000000
6,38NPMPTW36,WS5A8EPNGYV4ZCR3,3yr,standard,Partial Upfront,WS5A8EPNGYV4ZCR3.38NPMPTW36.2TG2D8R56U,Upfront Fee,USD 43044
7,38NPMPTW36,WS5A8EPNGYV4ZCR3,3yr,standard,Partial Upfront,WS5A8EPNGYV4ZCR3.38NPMPTW36.6YS6EN2CT7,"Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 1.6378900000
8,7NE97W5U4E,WS5A8EPNGYV4ZCR3,1yr,convertible,No Upfront,WS5A8EPNGYV4ZCR3.7NE97W5U4E.6YS6EN2CT7,"Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 4.1377900000
9,VJWZNREJX2,WS5A8EPNGYV4ZCR3,1yr,convertible,All Upfront,WS5A8EPNGYV4ZCR3.VJWZNREJX2.2TG2D8R56U,Upfront Fee,USD 35120
10,VJWZNREJX2,WS5A8EPNGYV4ZCR3,1yr,convertible,All Upfront,WS5A8EPNGYV4ZCR3.VJWZNREJX2.6YS6EN2CT7,"USD 0.0 per Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 0.0000000000
11,MZU6U2429S,Y6PPUWXG9ZYTC3KY,3yr,convertible,All Upfront,Y6PPUWXG9ZYTC3KY.MZU6U2429S.2TG2D8R56U,Upfront Fee,USD 65808
12,MZU6U2429S,Y6PPUWXG9ZYTC3KY,3yr,convertible,All Upfront,Y6PPUWXG9ZYTC3KY.MZU6U2429S.6YS6EN2CT7,"USD 0.0 per Red Hat Enterprise Linux (Amazon VPC), c5ad.24xlarge reserved instance applied",USD 0.0000000000
13,NQ3QZPMQV9,Y6PPUWXG9ZYTC3KY,3yr,standard,All Upfront,Y6PPUWXG9ZYTC3KY.NQ3QZPMQV9.2TG2D8R56U,Upfront Fee,USD 49898
14,NQ3QZPMQV9,Y6PPUWXG9ZYTC3KY,3yr,standard,All Upfront,Y6PPUWXG9ZYTC3KY.NQ3QZPMQV9.6YS6EN2CT7,"USD 0.0 per Red Hat Enterprise Linux (Amazon VPC), c5ad.24xlarge reserved instance applied",USD 0.0000000000

rwalde · Sep-08-2022, 10:32 AM

This worked like a charm Smile

Thank you for your inputs Smile

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Indirectlty convert string to float in JSON file	WBPYTHON	6	8,237	May-06-2020, 12:09 PM Last Post: WBPYTHON
	Parse XML String in Pandas Dataframe	creedX	2	8,437	Dec-09-2019, 07:35 PM Last Post: creedX
	parse json output to simple text or variable	murali_datascience	1	3,052	Jun-25-2019, 02:56 PM Last Post: perfringo
	Read Nested JSON with pandas.io.json	palo173	4	10,915	Apr-29-2019, 01:25 PM Last Post: palo173
	Pandas nested json data to dataframe	FrankC	1	11,071	Aug-14-2018, 01:37 AM Last Post: scidam
	Issue in getting the required data from nested json and store it in a pandas datafram	PrateekG	2	4,071	May-20-2018, 11:25 AM Last Post: scidam

Parse Nested JSON String in Python

User Panel Messages

Announcements