Python Forum
Read Nested JSON with pandas.io.json
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Read Nested JSON with pandas.io.json
#1
Hi, I need help with read a JSON for next working with data.

How Can I get table with 4 columns: Data.Code; Data.snapshots.DateFrom; Data.snapshots.Address.Street; Data.snapshots.Address.City

This is my code, but it is necessary to correct it, but I do not how. The Code works but it returns 30 columns and not exactly what I want.

Thank you for your time Shy

import pandas as pd
import requests
import pandas.io.json as pd_json

log = ("user", "password")
url = "http://serverxyz/api/v1/Catalog/Categories?pageSize=2&pageIndex=0"

req = requests.get(url, auth = log)
req.raise_for_status()
fin = req.json()

df = pd_json.json_normalize(fin, 
                        record_path=['Data','snapshots'],
                        record_prefix = 'Data.',
                        errors = 'ignore'
                        )

print(df)
This is my JSON:

My JSON sample to download from OneDrive


Quote:{
"PageIndex": 0,
"PageSize": 2,
"TotalCount": 100248,
"Data": [
{
"Code": "859182400102974",
"Timestamp": "2019-04-17T12:16:51Z",
"Category": 0,
"snapshots": [
{
"Code": "859182400102974",
"DateFrom": "2016-12-31T23:00:00Z",
"DateTo": "2017-05-09T22:00:00Z",
"Type": "CCO",
"VoltageLevel": 400,
"IsIsland": false,
"IsPps": false,
"MeasurementType": "CMC",
"InstalledPower": 0,
"GridId": 11,
"MeteredDataProvider": "8591824048108",
"Supplier": "8591824071403",
"SubjectOfSettlement": "8591824071403",
"IsSummarizingForSubjectOfSettlement": false,
"AnnualConsumptionEstimation": -502,
"TDDClass": "004",
"TempArea": "009",
"IsForeign": false,
"IsSLRActive": false,
"DGIFrequency": 1,
"FirstMonthReading": 5,
"IsCompositeService": true,
"IsAggregatedInvoice": true,
"IsImplicitSoS": false,
"ReservedPower": 0,
"PhasesCount": "3",
"IsMicrosource": false,
"IsDisconnectionPlanned": false,
"Name": "Petra"
},
{
"Code": "859182400102974",
"DateFrom": "2017-05-09T22:00:00Z",
"DateTo": "2018-01-31T23:00:00Z",
"Type": "CCO",
"VoltageLevel": 400,
"IsIsland": false,
"IsPps": false,
"MeasurementType": "CMC",
"InstalledPower": 0,
"GridId": 11,
"MeteredDataProvider": "8591824048108",
"Supplier": "8591824071403",
"SubjectOfSettlement": "8591824071403",
"IsSummarizingForSubjectOfSettlement": false,
"AnnualConsumptionEstimation": -382,
"TDDClass": "004",
"TempArea": "009",
"IsForeign": false,
"IsSLRActive": false,
"DGIFrequency": 1,
"FirstMonthReading": 5,
"IsCompositeService": true,
"IsAggregatedInvoice": true,
"IsImplicitSoS": false,
"ReservedPower": 0,
"PhasesCount": "3",
"IsMicrosource": false,
"IsDisconnectionPlanned": false,
"Name": "Petra"
}
],
"scalars": {
"ConsumptionEstimation": [
{
"DateFrom": "2016-12-31T23:00:00Z",
"DateTo": "2017-05-09T22:00:00Z",
"ConsumptionEstimation": -502
},
{
"DateFrom": "2017-05-09T22:00:00Z",
"DateTo": "2018-01-31T23:00:00Z",
"ConsumptionEstimation": -382
}
],
"ConsumptionEstimation2": [
{
"DateFrom": "2016-12-31T23:00:00Z",
"DateTo": "2017-05-09T22:00:00Z",
"ConsumptionEstimation2": -502
},
{
"DateFrom": "2017-05-09T22:00:00Z",
"DateTo": "2018-01-31T23:00:00Z",
"ConsumptionEstimation2": -382
}
]
}
},
{
"Code": "859182400104897",
"Timestamp": "2019-04-17T12:16:51Z",
"Category": 0,
"snapshots": [
{
"Code": "859182400104897",
"DateFrom": "2016-11-18T23:00:00Z",
"DateTo": "2017-11-05T23:00:00Z",
"Type": "CCO",
"VoltageLevel": 400,
"IsIsland": false,
"IsPps": false,
"MeasurementType": "CMC",
"InstalledPower": 0,
"GridId": 11,
"MeteredDataProvider": "8591824048108",
"Supplier": "8591824071403",
"SubjectOfSettlement": "8591824071403",
"IsSummarizingForSubjectOfSettlement": false,
"AnnualConsumptionEstimation": -280,
"TDDClass": "004",
"TempArea": "009",
"IsForeign": false,
"Address": {
"Street": "Okružní",
"City": "Semovo Ústí",
"PostCode": "39102"
},
"IsSLRActive": false,
"DGIFrequency": 0,
"FirstMonthReading": 0,
"IsCompositeService": false,
"IsAggregatedInvoice": false,
"IsImplicitSoS": false,
"ReservedPower": 0,
"IsMicrosource": false,
"IsDisconnectionPlanned": false,
"Name": "Martin"
},
{
"Code": "859182400104897",
"DateFrom": "2017-11-05T23:00:00Z",
"DateTo": "2027-01-16T23:00:00Z",
"Type": "CCO",
"VoltageLevel": 400,
"IsIsland": false,
"IsPps": false,
"MeasurementType": "CMC",
"InstalledPower": 0,
"GridId": 11,
"MeteredDataProvider": "8591824048108",
"Supplier": "8591824071403",
"SubjectOfSettlement": "8591824071403",
"IsSummarizingForSubjectOfSettlement": false,
"AnnualConsumptionEstimation": -282,
"TDDClass": "004",
"TempArea": "009",
"IsForeign": false,
"Address": {
"Street": "Okružní",
"City": "Semovo Ústí",
"PostCode": "39102"
},
"IsSLRActive": false,
"DGIFrequency": 0,
"FirstMonthReading": 0,
"IsCompositeService": false,
"IsAggregatedInvoice": false,
"IsImplicitSoS": false,
"ReservedPower": 0,
"IsMicrosource": false,
"IsDisconnectionPlanned": false,
"Name": "Martin"
}
],
"scalars": {
"ConsumptionEstimation": [
{
"DateFrom": "2016-11-18T23:00:00Z",
"DateTo": "2017-11-05T23:00:00Z",
"ConsumptionEstimation": -280
},
{
"DateFrom": "2017-11-05T23:00:00Z",
"DateTo": "2027-01-16T23:00:00Z",
"ConsumptionEstimation": -282
}
],
"ConsumptionEstimation2": [
{
"DateFrom": "2016-11-18T23:00:00Z",
"DateTo": "2017-11-05T23:00:00Z",
"ConsumptionEstimation2": -280
},
{
"DateFrom": "2017-11-05T23:00:00Z",
"DateTo": "2027-01-16T23:00:00Z",
"ConsumptionEstimation2": -282
}
]
}
}
]
}
Reply
#2
You can access only some columns of df using:
new_df = df[['Data.Code', 'Data.DateFrom', 'Data.Address']]
Reply
#3
(Apr-23-2019, 03:24 PM)gontajones Wrote: You can access only some columns of df using:
new_df = df[['Data.Code', 'Data.DateFrom', 'Data.Address']]

- But what about column Data.Address? There is lot of information in one column but not separetly. Instead of {'Street': 'Okružní', 'City': 'Sezimovo Ústí', 'PostCode': '39102'} I want only 2 separate columns Street and City

- And prefix of column is not only Data.xyz but for examlpe Data.snapshots.DateFrom or Data.snapshots.Address.Street etc.

- And it is not better use "df = pd_json.json_normalize" for reading and assigning to "df" only columns which I want, not all columns?

I think this part of code is necessary to modify, but I do not how

df = pd_json.json_normalize(fin, 
                        record_path=['Data','snapshots'],
                        record_prefix = 'Data.',
                        errors = 'ignore'
                        )
Reply
#4
This is not the best way but it is a start...

import pandas as pd
import pandas.io.json as pd_json
import json

with open("data.json", "r") as fd:
    json_content = json.load(fd)
    my_df_list = []
    for _, entry in enumerate(json_content['Data']):
        my_df_list.append(pd_json.json_normalize(entry['snapshots'], record_prefix='Data.'))
    df = pd.concat(my_df_list, ignore_index=True, sort=False)
    print(df)
Reply
#5
I think, your code is good way how to start solve my problem.
Although I have a problem with transform it just like my ideas. I can not find simple example, how to go deeper or shallower in nested JSON (JSON with lot of levels). I want to know how to get one information from each level of JSON and put it into table.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to compare two json and write to third json differences with pandas and numpy onenessboy 0 422 Jul-24-2020, 01:56 PM
Last Post: onenessboy
  JSON file Loading issue punna111 4 3,110 Jun-29-2020, 08:07 AM
Last Post: buran
  Loading multiple JSON files to create a csv 0LI5A3A 0 243 Jun-28-2020, 10:35 PM
Last Post: 0LI5A3A
  Can't read text file with pandas zinho 6 3,679 May-24-2020, 06:13 AM
Last Post: azajali43
  Indirectlty convert string to float in JSON file WBPYTHON 6 598 May-06-2020, 12:09 PM
Last Post: WBPYTHON
  Read json array data by pandas vipinct 0 360 Apr-13-2020, 02:24 PM
Last Post: vipinct
  Help batch converting .json chosen file to MySQL BrandonKastning 2 445 Mar-14-2020, 09:19 PM
Last Post: BrandonKastning
  Transform Facebook Graph API insights JSON to pandas dataframe usman 0 529 Mar-03-2020, 05:14 AM
Last Post: usman
  JSON to Dataframe DrX 1 441 Feb-19-2020, 07:58 PM
Last Post: DrX
  getting trailing zeros with 1 during pandas read fullstop 1 997 Jan-05-2020, 04:01 PM
Last Post: ichabod801

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020