Python Forum
Read Nested JSON with pandas.io.json - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Read Nested JSON with pandas.io.json (/thread-17775.html)



Read Nested JSON with pandas.io.json - palo173 - Apr-23-2019

Hi, I need help with read a JSON for next working with data.

How Can I get table with 4 columns: Data.Code; Data.snapshots.DateFrom; Data.snapshots.Address.Street; Data.snapshots.Address.City

This is my code, but it is necessary to correct it, but I do not how. The Code works but it returns 30 columns and not exactly what I want.

Thank you for your time Shy

import pandas as pd
import requests
import pandas.io.json as pd_json

log = ("user", "password")
url = "http://serverxyz/api/v1/Catalog/Categories?pageSize=2&pageIndex=0"

req = requests.get(url, auth = log)
req.raise_for_status()
fin = req.json()

df = pd_json.json_normalize(fin, 
                        record_path=['Data','snapshots'],
                        record_prefix = 'Data.',
                        errors = 'ignore'
                        )

print(df)
This is my JSON:

My JSON sample to download from OneDrive


Quote:{
"PageIndex": 0,
"PageSize": 2,
"TotalCount": 100248,
"Data": [
{
"Code": "859182400102974",
"Timestamp": "2019-04-17T12:16:51Z",
"Category": 0,
"snapshots": [
{
"Code": "859182400102974",
"DateFrom": "2016-12-31T23:00:00Z",
"DateTo": "2017-05-09T22:00:00Z",
"Type": "CCO",
"VoltageLevel": 400,
"IsIsland": false,
"IsPps": false,
"MeasurementType": "CMC",
"InstalledPower": 0,
"GridId": 11,
"MeteredDataProvider": "8591824048108",
"Supplier": "8591824071403",
"SubjectOfSettlement": "8591824071403",
"IsSummarizingForSubjectOfSettlement": false,
"AnnualConsumptionEstimation": -502,
"TDDClass": "004",
"TempArea": "009",
"IsForeign": false,
"IsSLRActive": false,
"DGIFrequency": 1,
"FirstMonthReading": 5,
"IsCompositeService": true,
"IsAggregatedInvoice": true,
"IsImplicitSoS": false,
"ReservedPower": 0,
"PhasesCount": "3",
"IsMicrosource": false,
"IsDisconnectionPlanned": false,
"Name": "Petra"
},
{
"Code": "859182400102974",
"DateFrom": "2017-05-09T22:00:00Z",
"DateTo": "2018-01-31T23:00:00Z",
"Type": "CCO",
"VoltageLevel": 400,
"IsIsland": false,
"IsPps": false,
"MeasurementType": "CMC",
"InstalledPower": 0,
"GridId": 11,
"MeteredDataProvider": "8591824048108",
"Supplier": "8591824071403",
"SubjectOfSettlement": "8591824071403",
"IsSummarizingForSubjectOfSettlement": false,
"AnnualConsumptionEstimation": -382,
"TDDClass": "004",
"TempArea": "009",
"IsForeign": false,
"IsSLRActive": false,
"DGIFrequency": 1,
"FirstMonthReading": 5,
"IsCompositeService": true,
"IsAggregatedInvoice": true,
"IsImplicitSoS": false,
"ReservedPower": 0,
"PhasesCount": "3",
"IsMicrosource": false,
"IsDisconnectionPlanned": false,
"Name": "Petra"
}
],
"scalars": {
"ConsumptionEstimation": [
{
"DateFrom": "2016-12-31T23:00:00Z",
"DateTo": "2017-05-09T22:00:00Z",
"ConsumptionEstimation": -502
},
{
"DateFrom": "2017-05-09T22:00:00Z",
"DateTo": "2018-01-31T23:00:00Z",
"ConsumptionEstimation": -382
}
],
"ConsumptionEstimation2": [
{
"DateFrom": "2016-12-31T23:00:00Z",
"DateTo": "2017-05-09T22:00:00Z",
"ConsumptionEstimation2": -502
},
{
"DateFrom": "2017-05-09T22:00:00Z",
"DateTo": "2018-01-31T23:00:00Z",
"ConsumptionEstimation2": -382
}
]
}
},
{
"Code": "859182400104897",
"Timestamp": "2019-04-17T12:16:51Z",
"Category": 0,
"snapshots": [
{
"Code": "859182400104897",
"DateFrom": "2016-11-18T23:00:00Z",
"DateTo": "2017-11-05T23:00:00Z",
"Type": "CCO",
"VoltageLevel": 400,
"IsIsland": false,
"IsPps": false,
"MeasurementType": "CMC",
"InstalledPower": 0,
"GridId": 11,
"MeteredDataProvider": "8591824048108",
"Supplier": "8591824071403",
"SubjectOfSettlement": "8591824071403",
"IsSummarizingForSubjectOfSettlement": false,
"AnnualConsumptionEstimation": -280,
"TDDClass": "004",
"TempArea": "009",
"IsForeign": false,
"Address": {
"Street": "Okružní",
"City": "Semovo Ústí",
"PostCode": "39102"
},
"IsSLRActive": false,
"DGIFrequency": 0,
"FirstMonthReading": 0,
"IsCompositeService": false,
"IsAggregatedInvoice": false,
"IsImplicitSoS": false,
"ReservedPower": 0,
"IsMicrosource": false,
"IsDisconnectionPlanned": false,
"Name": "Martin"
},
{
"Code": "859182400104897",
"DateFrom": "2017-11-05T23:00:00Z",
"DateTo": "2027-01-16T23:00:00Z",
"Type": "CCO",
"VoltageLevel": 400,
"IsIsland": false,
"IsPps": false,
"MeasurementType": "CMC",
"InstalledPower": 0,
"GridId": 11,
"MeteredDataProvider": "8591824048108",
"Supplier": "8591824071403",
"SubjectOfSettlement": "8591824071403",
"IsSummarizingForSubjectOfSettlement": false,
"AnnualConsumptionEstimation": -282,
"TDDClass": "004",
"TempArea": "009",
"IsForeign": false,
"Address": {
"Street": "Okružní",
"City": "Semovo Ústí",
"PostCode": "39102"
},
"IsSLRActive": false,
"DGIFrequency": 0,
"FirstMonthReading": 0,
"IsCompositeService": false,
"IsAggregatedInvoice": false,
"IsImplicitSoS": false,
"ReservedPower": 0,
"IsMicrosource": false,
"IsDisconnectionPlanned": false,
"Name": "Martin"
}
],
"scalars": {
"ConsumptionEstimation": [
{
"DateFrom": "2016-11-18T23:00:00Z",
"DateTo": "2017-11-05T23:00:00Z",
"ConsumptionEstimation": -280
},
{
"DateFrom": "2017-11-05T23:00:00Z",
"DateTo": "2027-01-16T23:00:00Z",
"ConsumptionEstimation": -282
}
],
"ConsumptionEstimation2": [
{
"DateFrom": "2016-11-18T23:00:00Z",
"DateTo": "2017-11-05T23:00:00Z",
"ConsumptionEstimation2": -280
},
{
"DateFrom": "2017-11-05T23:00:00Z",
"DateTo": "2027-01-16T23:00:00Z",
"ConsumptionEstimation2": -282
}
]
}
}
]
}



RE: Read Nested JSON with pandas.io.json - gontajones - Apr-23-2019

You can access only some columns of df using:
new_df = df[['Data.Code', 'Data.DateFrom', 'Data.Address']]



RE: Read Nested JSON with pandas.io.json - palo173 - Apr-24-2019

(Apr-23-2019, 03:24 PM)gontajones Wrote: You can access only some columns of df using:
new_df = df[['Data.Code', 'Data.DateFrom', 'Data.Address']]

- But what about column Data.Address? There is lot of information in one column but not separetly. Instead of {'Street': 'Okružní', 'City': 'Sezimovo Ústí', 'PostCode': '39102'} I want only 2 separate columns Street and City

- And prefix of column is not only Data.xyz but for examlpe Data.snapshots.DateFrom or Data.snapshots.Address.Street etc.

- And it is not better use "df = pd_json.json_normalize" for reading and assigning to "df" only columns which I want, not all columns?

I think this part of code is necessary to modify, but I do not how

df = pd_json.json_normalize(fin, 
                        record_path=['Data','snapshots'],
                        record_prefix = 'Data.',
                        errors = 'ignore'
                        )



RE: Read Nested JSON with pandas.io.json - gontajones - Apr-24-2019

This is not the best way but it is a start...

import pandas as pd
import pandas.io.json as pd_json
import json

with open("data.json", "r") as fd:
    json_content = json.load(fd)
    my_df_list = []
    for _, entry in enumerate(json_content['Data']):
        my_df_list.append(pd_json.json_normalize(entry['snapshots'], record_prefix='Data.'))
    df = pd.concat(my_df_list, ignore_index=True, sort=False)
    print(df)



RE: Read Nested JSON with pandas.io.json - palo173 - Apr-29-2019

I think, your code is good way how to start solve my problem.
Although I have a problem with transform it just like my ideas. I can not find simple example, how to go deeper or shallower in nested JSON (JSON with lot of levels). I want to know how to get one information from each level of JSON and put it into table.