Parse Nested JSON String in Python - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Parse Nested JSON String in Python (/thread-38090.html) |
Parse Nested JSON String in Python - rwalde - Sep-02-2022 Hello All, I am trying to parse complex JSON String from AWS Pricing JSON file and convert it into CSV file based on some conditions. I was able to achieve the parsing code implementation using for loops and pandas, but need more efficient way of implementing the code as the performance of the code is really not good. I wanted understand how we can parse complex JSON without using for loops or if we have any other efficient way to implement the code, I tried using JSON normalize in pandas also. I would really appreciate the inputs. Below is the code snippet I have used - def create_all_ri_regions_pricing_file(): with open(INDEX_JSON_1) as pricing: read_content = json.load(pricing) reserved_df = read_content["terms"]["Reserved"] df_final = pd.DataFrame() for key in reserved_df: for key2 in reserved_df[key]: df = pd.DataFrame(reserved_df[key][key2]).reset_index(drop=True) print("Data frame 0--> ", df.columns) df["LeaseContractLength"] = reserved_df[key][key2]["termAttributes"]["LeaseContractLength"] df["PurchaseOption"] = reserved_df[key][key2]["termAttributes"]["PurchaseOption"] df["OfferingClass"] = reserved_df[key][key2]["termAttributes"]["OfferingClass"] df = df.drop(columns=["termAttributes"]) df = pd.concat([df["priceDimensions"].apply(pd.Series), df.drop("priceDimensions", axis=1)], axis=1) df = df[ [ "offerTermCode", "sku", "LeaseContractLength", "OfferingClass", "PurchaseOption", "rateCode", "description", "unit", "pricePerUnit", ] ] print("Data frame --> ", df) df_final = pd.concat([df_final, df]).reset_index(drop=True) df_final = df_final.loc[ (df_final["LeaseContractLength"] == "1yr") & (df_final["OfferingClass"] == "standard") & (df_final["PurchaseOption"] == "All Upfront") & (df_final["unit"] == "Hrs") ] # print("final Data frame --> ", df_final) df_final.to_csv("./final.csv")I have attached the JSON file to be parsed and expected CSV file for the reference. RE: Parse Nested JSON String in Python - Larz60+ - Sep-02-2022 None of the items in the index sample seem to match up with the expectedCSVFile.csv contents. If I am wrong, let me know, otherwise please re-post matched samples. New Edit: Also, I see multiple 'priceDimensions' in the json file. Do you expect a separate entry in the csv file for each (as I would expect)? Another edit: I also see that pricePerUnit could potentially contain many values. This leads me to the question, why do you want to move the data to csv file, when the json data is much more navigable? RE: Parse Nested JSON String in Python - deanhystad - Sep-02-2022 Oops. Replied to wrong post. RE: Parse Nested JSON String in Python - Larz60+ - Sep-02-2022 The following code will give you a CSV file with each entry in the json file normalized into one or more csv records. The json file would be much easier to use. I love pandas, but think it's an additional layer that is not necessary in this instance, so didn't use it. the jason file used was the sample file you posted. import os from pathlib import Path from copy import deepcopy import csv import json import sys class parse_index: def __init__(self): # files are in same path as script, assure path here. os.chdir(os.path.abspath(os.path.dirname(__file__))) filepath = Path('.') jsonfile = filepath / 'index.json' with jsonfile.open() as fp: self.jindex = json.load(fp) self.csvfile = filepath / 'newcsvfile.csv' def create_csv(self): header = ["item no", "offerTermCode", "sku", "LeaseContractLength", "OfferingClass", "PurchaseOption", "rateCode", "description", "unit", "pricePerUnit"] base = self.jindex["terms"]["Reserved"] with self.csvfile.open('w') as cfptr: cwrite = csv.writer(cfptr, delimiter=',').writerow cwrite(header) seqno = 1 for key in base.keys(): # print(f"key: {key}") bk = base[key] for key1 in bk.keys(): bk1 = bk[key1] buffer1 = [] buffer1.append(bk1['offerTermCode']) buffer1.append(bk1['sku']) buffer1.append(bk1['termAttributes']['LeaseContractLength']) buffer1.append(bk1['termAttributes']['OfferingClass']) buffer1.append(bk1['termAttributes']['PurchaseOption']) # dynamic information for key2 in bk1['priceDimensions'].keys(): buffer2 = deepcopy(buffer1) bk2 = bk1['priceDimensions'][key2] buffer2.append(bk2["rateCode"]) buffer2.append(bk2["description"]) unit = bk2['unit'] bk3 = bk2['pricePerUnit'] for key3, value in bk3.items(): buffer3 = [seqno] + deepcopy(buffer2) buffer3.append(f"{key3} {value}") # print(f"buffer3: {buffer3}") cwrite(buffer3) seqno += 1 def main(): pi = parse_index() pi.create_csv() if __name__ == '__main__': main()csv file (newcsvfile.csv) contents:
RE: Parse Nested JSON String in Python - rwalde - Sep-08-2022 This worked like a charm Thank you for your inputs |