Python Forum

Hello All,
I am trying to parse complex JSON String from AWS Pricing JSON file and convert it into CSV file based on some conditions. I was able to achieve the parsing code implementation using for loops and pandas, but need more efficient way of implementing the code as the performance of the code is really not good. I wanted understand how we can parse complex JSON without using for loops or if we have any other efficient way to implement the code, I tried using JSON normalize in pandas also. I would really appreciate the inputs.
Below is the code snippet I have used -

def create_all_ri_regions_pricing_file():
    with open(INDEX_JSON_1) as pricing:
        read_content = json.load(pricing)
        reserved_df = read_content["terms"]["Reserved"]
        df_final = pd.DataFrame()
        for key in reserved_df:
            for key2 in reserved_df[key]:
                df = pd.DataFrame(reserved_df[key][key2]).reset_index(drop=True)
                print("Data frame 0--> ", df.columns)
                df["LeaseContractLength"] = reserved_df[key][key2]["termAttributes"]["LeaseContractLength"]
                df["PurchaseOption"] = reserved_df[key][key2]["termAttributes"]["PurchaseOption"]
                df["OfferingClass"] = reserved_df[key][key2]["termAttributes"]["OfferingClass"]
                df = df.drop(columns=["termAttributes"])
                df = pd.concat([df["priceDimensions"].apply(pd.Series), df.drop("priceDimensions", axis=1)], axis=1)
                df = df[
                    [
                        "offerTermCode",
                        "sku",
                        "LeaseContractLength",
                        "OfferingClass",
                        "PurchaseOption",
                        "rateCode",
                        "description",
                        "unit",
                        "pricePerUnit",
                    ]
                ]
                print("Data frame --> ", df)
                df_final = pd.concat([df_final, df]).reset_index(drop=True)
        df_final = df_final.loc[
            (df_final["LeaseContractLength"] == "1yr")
            & (df_final["OfferingClass"] == "standard")
            & (df_final["PurchaseOption"] == "All Upfront")
            & (df_final["unit"] == "Hrs")
        ]
        # print("final Data frame --> ", df_final)
        df_final.to_csv("./final.csv")

I have attached the JSON file to be parsed and expected CSV file for the reference.

None of the items in the index sample seem to match up with the expectedCSVFile.csv contents.
If I am wrong, let me know, otherwise please re-post matched samples.

New Edit:
Also, I see multiple 'priceDimensions' in the json file.
Do you expect a separate entry in the csv file for each (as I would expect)?

Another edit:
I also see that pricePerUnit could potentially contain many values.
This leads me to the question, why do you want to move the data to csv file, when the json data is much more navigable?

Oops. Replied to wrong post.

The following code will give you a CSV file with each entry in the json file normalized into one or more csv records.
The json file would be much easier to use.
I love pandas, but think it's an additional layer that is not necessary in this instance, so didn't use it.
the jason file used was the sample file you posted.

import os
from pathlib import Path
from copy import deepcopy
import csv
import json
import sys


class parse_index:
    def __init__(self):
        # files are in same path as script, assure path here.
        os.chdir(os.path.abspath(os.path.dirname(__file__)))

        filepath = Path('.')
        jsonfile = filepath / 'index.json'
        with  jsonfile.open() as fp:            
            self.jindex = json.load(fp)
        self.csvfile = filepath / 'newcsvfile.csv'

    def create_csv(self):
        header = ["item no", "offerTermCode", "sku", "LeaseContractLength",
                    "OfferingClass", "PurchaseOption", "rateCode",
                    "description", "unit", "pricePerUnit"]
        base = self.jindex["terms"]["Reserved"]
        with self.csvfile.open('w') as cfptr:
            cwrite = csv.writer(cfptr, delimiter=',').writerow
            cwrite(header)

            seqno = 1
            for key in base.keys():
                # print(f"key: {key}")
                bk = base[key]
                for key1 in bk.keys():
                    bk1 = bk[key1]
                
                    buffer1 = []
                    buffer1.append(bk1['offerTermCode'])
                    buffer1.append(bk1['sku'])
                    buffer1.append(bk1['termAttributes']['LeaseContractLength'])
                    buffer1.append(bk1['termAttributes']['OfferingClass'])
                    buffer1.append(bk1['termAttributes']['PurchaseOption'])

                    # dynamic information
                    for key2 in bk1['priceDimensions'].keys():
                        buffer2 = deepcopy(buffer1)
                        bk2 = bk1['priceDimensions'][key2]
                        buffer2.append(bk2["rateCode"])
                        buffer2.append(bk2["description"])
                        unit = bk2['unit']
                        bk3 = bk2['pricePerUnit']
                        for key3, value in bk3.items():
                            buffer3 = [seqno] + deepcopy(buffer2)
                            buffer3.append(f"{key3} {value}")
                            # print(f"buffer3: {buffer3}")
                            cwrite(buffer3)
                            seqno += 1


def main():
    pi = parse_index()
    pi.create_csv()


if __name__ == '__main__':
    main()

csv file (newcsvfile.csv) contents:

Output:item no,offerTermCode,sku,LeaseContractLength,OfferingClass,PurchaseOption,rateCode,description,unit,pricePerUnit
1,MZU6U2429S,WS5A8EPNGYV4ZCR3,3yr,convertible,All Upfront,WS5A8EPNGYV4ZCR3.MZU6U2429S.2TG2D8R56U,Upfront Fee,USD 89879
2,MZU6U2429S,WS5A8EPNGYV4ZCR3,3yr,convertible,All Upfront,WS5A8EPNGYV4ZCR3.MZU6U2429S.6YS6EN2CT7,"USD 0.0 per Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 0.0000000000
3,4NA7Y494T4,WS5A8EPNGYV4ZCR3,1yr,standard,No Upfront,WS5A8EPNGYV4ZCR3.4NA7Y494T4.6YS6EN2CT7,"Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 3.8897400000
4,R5XV2EPZQZ,WS5A8EPNGYV4ZCR3,3yr,convertible,Partial Upfront,WS5A8EPNGYV4ZCR3.R5XV2EPZQZ.2TG2D8R56U,Upfront Fee,USD 45265
5,R5XV2EPZQZ,WS5A8EPNGYV4ZCR3,3yr,convertible,Partial Upfront,WS5A8EPNGYV4ZCR3.R5XV2EPZQZ.6YS6EN2CT7,"Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 1.7224000000
6,38NPMPTW36,WS5A8EPNGYV4ZCR3,3yr,standard,Partial Upfront,WS5A8EPNGYV4ZCR3.38NPMPTW36.2TG2D8R56U,Upfront Fee,USD 43044
7,38NPMPTW36,WS5A8EPNGYV4ZCR3,3yr,standard,Partial Upfront,WS5A8EPNGYV4ZCR3.38NPMPTW36.6YS6EN2CT7,"Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 1.6378900000
8,7NE97W5U4E,WS5A8EPNGYV4ZCR3,1yr,convertible,No Upfront,WS5A8EPNGYV4ZCR3.7NE97W5U4E.6YS6EN2CT7,"Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 4.1377900000
9,VJWZNREJX2,WS5A8EPNGYV4ZCR3,1yr,convertible,All Upfront,WS5A8EPNGYV4ZCR3.VJWZNREJX2.2TG2D8R56U,Upfront Fee,USD 35120
10,VJWZNREJX2,WS5A8EPNGYV4ZCR3,1yr,convertible,All Upfront,WS5A8EPNGYV4ZCR3.VJWZNREJX2.6YS6EN2CT7,"USD 0.0 per Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 0.0000000000
11,MZU6U2429S,Y6PPUWXG9ZYTC3KY,3yr,convertible,All Upfront,Y6PPUWXG9ZYTC3KY.MZU6U2429S.2TG2D8R56U,Upfront Fee,USD 65808
12,MZU6U2429S,Y6PPUWXG9ZYTC3KY,3yr,convertible,All Upfront,Y6PPUWXG9ZYTC3KY.MZU6U2429S.6YS6EN2CT7,"USD 0.0 per Red Hat Enterprise Linux (Amazon VPC), c5ad.24xlarge reserved instance applied",USD 0.0000000000
13,NQ3QZPMQV9,Y6PPUWXG9ZYTC3KY,3yr,standard,All Upfront,Y6PPUWXG9ZYTC3KY.NQ3QZPMQV9.2TG2D8R56U,Upfront Fee,USD 49898
14,NQ3QZPMQV9,Y6PPUWXG9ZYTC3KY,3yr,standard,All Upfront,Y6PPUWXG9ZYTC3KY.NQ3QZPMQV9.6YS6EN2CT7,"USD 0.0 per Red Hat Enterprise Linux (Amazon VPC), c5ad.24xlarge reserved instance applied",USD 0.0000000000

This worked like a charm Smile

Thank you for your inputs Smile

rwalde

Larz60+

deanhystad

Larz60+

rwalde