Python Forum
Parse Nested JSON String in Python
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parse Nested JSON String in Python
#1
Hello All,
I am trying to parse complex JSON String from AWS Pricing JSON file and convert it into CSV file based on some conditions. I was able to achieve the parsing code implementation using for loops and pandas, but need more efficient way of implementing the code as the performance of the code is really not good. I wanted understand how we can parse complex JSON without using for loops or if we have any other efficient way to implement the code, I tried using JSON normalize in pandas also. I would really appreciate the inputs.
Below is the code snippet I have used -

def create_all_ri_regions_pricing_file():
    with open(INDEX_JSON_1) as pricing:
        read_content = json.load(pricing)
        reserved_df = read_content["terms"]["Reserved"]
        df_final = pd.DataFrame()
        for key in reserved_df:
            for key2 in reserved_df[key]:
                df = pd.DataFrame(reserved_df[key][key2]).reset_index(drop=True)
                print("Data frame 0--> ", df.columns)
                df["LeaseContractLength"] = reserved_df[key][key2]["termAttributes"]["LeaseContractLength"]
                df["PurchaseOption"] = reserved_df[key][key2]["termAttributes"]["PurchaseOption"]
                df["OfferingClass"] = reserved_df[key][key2]["termAttributes"]["OfferingClass"]
                df = df.drop(columns=["termAttributes"])
                df = pd.concat([df["priceDimensions"].apply(pd.Series), df.drop("priceDimensions", axis=1)], axis=1)
                df = df[
                    [
                        "offerTermCode",
                        "sku",
                        "LeaseContractLength",
                        "OfferingClass",
                        "PurchaseOption",
                        "rateCode",
                        "description",
                        "unit",
                        "pricePerUnit",
                    ]
                ]
                print("Data frame --> ", df)
                df_final = pd.concat([df_final, df]).reset_index(drop=True)
        df_final = df_final.loc[
            (df_final["LeaseContractLength"] == "1yr")
            & (df_final["OfferingClass"] == "standard")
            & (df_final["PurchaseOption"] == "All Upfront")
            & (df_final["unit"] == "Hrs")
        ]
        # print("final Data frame --> ", df_final)
        df_final.to_csv("./final.csv")
I have attached the JSON file to be parsed and expected CSV file for the reference.

Attached Files

.json   index.json (Size: 8.91 KB / Downloads: 7)
.csv   expectedCSVFile.csv (Size: 10.51 KB / Downloads: 4)
Reply
#2
None of the items in the index sample seem to match up with the expectedCSVFile.csv contents.
If I am wrong, let me know, otherwise please re-post matched samples.

New Edit:
Also, I see multiple 'priceDimensions' in the json file.
Do you expect a separate entry in the csv file for each (as I would expect)?

Another edit:
I also see that pricePerUnit could potentially contain many values.
This leads me to the question, why do you want to move the data to csv file, when the json data is much more navigable?
Reply
#3
Oops. Replied to wrong post.
Reply
#4
The following code will give you a CSV file with each entry in the json file normalized into one or more csv records.
The json file would be much easier to use.
I love pandas, but think it's an additional layer that is not necessary in this instance, so didn't use it.
the jason file used was the sample file you posted.

import os
from pathlib import Path
from copy import deepcopy
import csv
import json
import sys


class parse_index:
    def __init__(self):
        # files are in same path as script, assure path here.
        os.chdir(os.path.abspath(os.path.dirname(__file__)))

        filepath = Path('.')
        jsonfile = filepath / 'index.json'
        with  jsonfile.open() as fp:            
            self.jindex = json.load(fp)
        self.csvfile = filepath / 'newcsvfile.csv'

    def create_csv(self):
        header = ["item no", "offerTermCode", "sku", "LeaseContractLength",
                    "OfferingClass", "PurchaseOption", "rateCode",
                    "description", "unit", "pricePerUnit"]
        base = self.jindex["terms"]["Reserved"]
        with self.csvfile.open('w') as cfptr:
            cwrite = csv.writer(cfptr, delimiter=',').writerow
            cwrite(header)

            seqno = 1
            for key in base.keys():
                # print(f"key: {key}")
                bk = base[key]
                for key1 in bk.keys():
                    bk1 = bk[key1]
                
                    buffer1 = []
                    buffer1.append(bk1['offerTermCode'])
                    buffer1.append(bk1['sku'])
                    buffer1.append(bk1['termAttributes']['LeaseContractLength'])
                    buffer1.append(bk1['termAttributes']['OfferingClass'])
                    buffer1.append(bk1['termAttributes']['PurchaseOption'])

                    # dynamic information
                    for key2 in bk1['priceDimensions'].keys():
                        buffer2 = deepcopy(buffer1)
                        bk2 = bk1['priceDimensions'][key2]
                        buffer2.append(bk2["rateCode"])
                        buffer2.append(bk2["description"])
                        unit = bk2['unit']
                        bk3 = bk2['pricePerUnit']
                        for key3, value in bk3.items():
                            buffer3 = [seqno] + deepcopy(buffer2)
                            buffer3.append(f"{key3} {value}")
                            # print(f"buffer3: {buffer3}")
                            cwrite(buffer3)
                            seqno += 1


def main():
    pi = parse_index()
    pi.create_csv()


if __name__ == '__main__':
    main()
csv file (newcsvfile.csv) contents:
Output:
item no,offerTermCode,sku,LeaseContractLength,OfferingClass,PurchaseOption,rateCode,description,unit,pricePerUnit 1,MZU6U2429S,WS5A8EPNGYV4ZCR3,3yr,convertible,All Upfront,WS5A8EPNGYV4ZCR3.MZU6U2429S.2TG2D8R56U,Upfront Fee,USD 89879 2,MZU6U2429S,WS5A8EPNGYV4ZCR3,3yr,convertible,All Upfront,WS5A8EPNGYV4ZCR3.MZU6U2429S.6YS6EN2CT7,"USD 0.0 per Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 0.0000000000 3,4NA7Y494T4,WS5A8EPNGYV4ZCR3,1yr,standard,No Upfront,WS5A8EPNGYV4ZCR3.4NA7Y494T4.6YS6EN2CT7,"Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 3.8897400000 4,R5XV2EPZQZ,WS5A8EPNGYV4ZCR3,3yr,convertible,Partial Upfront,WS5A8EPNGYV4ZCR3.R5XV2EPZQZ.2TG2D8R56U,Upfront Fee,USD 45265 5,R5XV2EPZQZ,WS5A8EPNGYV4ZCR3,3yr,convertible,Partial Upfront,WS5A8EPNGYV4ZCR3.R5XV2EPZQZ.6YS6EN2CT7,"Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 1.7224000000 6,38NPMPTW36,WS5A8EPNGYV4ZCR3,3yr,standard,Partial Upfront,WS5A8EPNGYV4ZCR3.38NPMPTW36.2TG2D8R56U,Upfront Fee,USD 43044 7,38NPMPTW36,WS5A8EPNGYV4ZCR3,3yr,standard,Partial Upfront,WS5A8EPNGYV4ZCR3.38NPMPTW36.6YS6EN2CT7,"Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 1.6378900000 8,7NE97W5U4E,WS5A8EPNGYV4ZCR3,1yr,convertible,No Upfront,WS5A8EPNGYV4ZCR3.7NE97W5U4E.6YS6EN2CT7,"Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 4.1377900000 9,VJWZNREJX2,WS5A8EPNGYV4ZCR3,1yr,convertible,All Upfront,WS5A8EPNGYV4ZCR3.VJWZNREJX2.2TG2D8R56U,Upfront Fee,USD 35120 10,VJWZNREJX2,WS5A8EPNGYV4ZCR3,1yr,convertible,All Upfront,WS5A8EPNGYV4ZCR3.VJWZNREJX2.6YS6EN2CT7,"USD 0.0 per Windows (Amazon VPC), m6a.12xlarge reserved instance applied",USD 0.0000000000 11,MZU6U2429S,Y6PPUWXG9ZYTC3KY,3yr,convertible,All Upfront,Y6PPUWXG9ZYTC3KY.MZU6U2429S.2TG2D8R56U,Upfront Fee,USD 65808 12,MZU6U2429S,Y6PPUWXG9ZYTC3KY,3yr,convertible,All Upfront,Y6PPUWXG9ZYTC3KY.MZU6U2429S.6YS6EN2CT7,"USD 0.0 per Red Hat Enterprise Linux (Amazon VPC), c5ad.24xlarge reserved instance applied",USD 0.0000000000 13,NQ3QZPMQV9,Y6PPUWXG9ZYTC3KY,3yr,standard,All Upfront,Y6PPUWXG9ZYTC3KY.NQ3QZPMQV9.2TG2D8R56U,Upfront Fee,USD 49898 14,NQ3QZPMQV9,Y6PPUWXG9ZYTC3KY,3yr,standard,All Upfront,Y6PPUWXG9ZYTC3KY.NQ3QZPMQV9.6YS6EN2CT7,"USD 0.0 per Red Hat Enterprise Linux (Amazon VPC), c5ad.24xlarge reserved instance applied",USD 0.0000000000
Reply
#5
This worked like a charm Smile Thank you for your inputs Smile
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Indirectlty convert string to float in JSON file WBPYTHON 6 5,943 May-06-2020, 12:09 PM
Last Post: WBPYTHON
  Parse XML String in Pandas Dataframe creedX 2 6,872 Dec-09-2019, 07:35 PM
Last Post: creedX
  parse json output to simple text or variable murali_datascience 1 2,413 Jun-25-2019, 02:56 PM
Last Post: perfringo
  Read Nested JSON with pandas.io.json palo173 4 9,581 Apr-29-2019, 01:25 PM
Last Post: palo173
  Pandas nested json data to dataframe FrankC 1 10,149 Aug-14-2018, 01:37 AM
Last Post: scidam
  Issue in getting the required data from nested json and store it in a pandas datafram PrateekG 2 3,304 May-20-2018, 11:25 AM
Last Post: scidam

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020