pandas.json_normalize question

elsvieta · (This post was last modified: Apr-02-2025, 08:28 PM by deanhystad.)

hi all,
I just wrote my hello world program in Python and I need some guidance for reading a somewhat complex json file for uploading it to a Sql Server table.
I installed pandas and I see it has a input/output command pandas.json_normalize()
I see this example:

Output:>>> data = [
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 130, "weight": 60},
...     },
...     {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 130, "weight": 60},
...     },
... ]
>>> pd.json_normalize(data, max_level=1)
    id        name          fitness.height  fitness.weight
0  1.0      Cole Volk             130              60
1  NaN    Mark Reg             130              60
2  2.0     Faye Raker           130              60

which is exactly what I need as an output.
Basically, I start with this multiple level nested json and I build this file where various levels become level1.level2.level3... columns in a table and the values are in rows.

Please help me take the first step: how do I get the data argument from my initial json file myFile.json?
pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)

Thanks,
elsvieta

elsvieta · Apr-02-2025, 08:42 PM

ok,
I figured out the following:
1. need to start with import pandas as pd
2. need to create a data frame object df = pd.read_json(r"C:\path\to\file\input.json")
3. then I would create new csv file with df.to_csv('C:\path\to\file\output.csv')

The output.csv was not very different than the initial file. I need to see how the normalize command operates on the data frame.

I'm still waiting for any advice,
Thanks,
Regards,
elsvieta

**deanhystad** · (This post was last modified: Apr-02-2025, 09:01 PM by deanhystad.)

You do not build a json file. You create a python object and use json.dump(obj, file) to create a json file.

This is what a the json file looks like for the list in your post.
[{"id": 1, "name": "Cole Volk", "fitness": {"height": 130, "weight": 60}}, {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}}, {"id": 2, "name": "Faye Raker", "fitness": {"height": 130, "weight": 60}}]

I created it like this:

import json

# This is not json.  This is a list of dictionaries.
data = [
    {
        "id": 1,
        "name": "Cole Volk",
        "fitness": {"height": 130, "weight": 60},
    },
    {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
    {
        "id": 2,
        "name": "Faye Raker",
        "fitness": {"height": 130, "weight": 60},
    },
]
with open("test.json", "w") as file:
    data = json.dump(data, file)  # This writes data to file using json syntax.

If you had some records from a database that you want to put in a json file:
1 Retrieve the records from the database.
2 Put the records in a list.
3 Open a file for writing.
4 Dump the list to the file.
You can use indent in the dump command to make the file prettier, but the purpose of json is portable storage and retrieval. json is not a human friendly presentation format.

This is a program that reads the json file and prints the list.

import json

with open("test.json", "r") as file:
    data = json.load(file)
print(data)

Output:
[{'id': 1, 'name': 'Cole Volk', 'fitness': {'height': 130, 'weight': 60}}, {'name': 'Mark Reg', 'fitness': {'height': 130, 'weight': 60}}, {'id': 2, 'name': 'Faye Raker', 'fitness': {'height': 130, 'weight': 60}}]

To make it pretty I can use json.

import json

with open("test.json", "r") as file:
    data = json.load(file)
print(json.dumps(data, indent=4))

Output:[
    {
        "id": 1,
        "name": "Cole Volk",
        "fitness": {
            "height": 130,
            "weight": 60
        }
    },
    {
        "name": "Mark Reg",
        "fitness": {
            "height": 130,
            "weight": 60
        }
    },
    {
        "id": 2,
        "name": "Faye Raker",
        "fitness": {
            "height": 130,
            "weight": 60
        }
    }
]

json.dumps(obj) is like json.dump(obj, file) except it produces a string instead of writing to a file. There is a corresponding json.loads(string) that is similar to json.load(file).

**deanhystad** · (This post was last modified: Apr-02-2025, 09:41 PM by deanhystad.)

Sounds like you know all you need to proceed. Time for you to start writing some code. Trial and error is important at the early stages of learning.

I wrote this to see what happens when I use pandas to load the json file I created earlier.

import pandas as pd

df = pd.read_json("data.json")
print(df)

Output:    id        name                        fitness
0  1.0   Cole Volk  {'height': 130, 'weight': 60}
1  NaN    Mark Reg  {'height': 130, 'weight': 60}
2  2.0  Faye Raker  {'height': 130, 'weight': 60}

Notice that fitness is a dictionary, not columns. If I try to write the pandas dataframe to a csv format file:

import pandas as pd

df = pd.read_json("data.json")
df.to_csv("data.csv")

I get this in data.csv

Output:,id,name,fitness
0,1.0,Cole Volk,"{'height': 130, 'weight': 60}"
1,,Mark Reg,"{'height': 130, 'weight': 60}"
2,2.0,Faye Raker,"{'height': 130, 'weight': 60}"

Not really a csv format file. That is why the dataframe had to be normalized.

import json
import pandas as pd


with open("data.json", "r") as file:
    data = json.load(file)
df = pd.json_normalize(data, max_level=1)
print(df)
df.to_csv("data.csv")

output

Output:    id        name  fitness.height  fitness.weight
0  1.0   Cole Volk             130              60
1  NaN    Mark Reg             130              60
2  2.0  Faye Raker             130              60

data.csv

,id,name,fitness.height,fitness.weight
0,1.0,Cole Volk,130,60
1,,Mark Reg,130,60
2,2.0,Faye Raker,130,60

Notice in your original post that the call to pd.json_normalize() takes data, a list of dictionaries, not an existing dataframe as input. Read the json file to get the list of dictionaries. Use json_normalize(data) to create a dataframe that expands a dictionary into columns.

Your next post better have some code you wrote and error messages or sample input/output.

elsvieta · Apr-03-2025, 07:36 PM

(Apr-02-2025, 09:40 PM)deanhystad Wrote: Sounds like you know all you need to proceed. Time for you to start writing some code. Trial and error is important at the early stages of learning.

I wrote this to see what happens when I use pandas to load the json file I created earlier.
import pandas as pd

df = pd.read_json("data.json")
print(df)
Output:    id        name                        fitness
0  1.0   Cole Volk  {'height': 130, 'weight': 60}
1  NaN    Mark Reg  {'height': 130, 'weight': 60}
2  2.0  Faye Raker  {'height': 130, 'weight': 60}
Notice that fitness is a dictionary, not columns. If I try to write the pandas dataframe to a csv format file:
import pandas as pd

df = pd.read_json("data.json")
df.to_csv("data.csv")
I get this in data.csv
Output:,id,name,fitness
0,1.0,Cole Volk,"{'height': 130, 'weight': 60}"
1,,Mark Reg,"{'height': 130, 'weight': 60}"
2,2.0,Faye Raker,"{'height': 130, 'weight': 60}"
Not really a csv format file. That is why the dataframe had to be normalized.
import json
import pandas as pd


with open("data.json", "r") as file:
    data = json.load(file)
df = pd.json_normalize(data, max_level=1)
print(df)
df.to_csv("data.csv")
output
Output:    id        name  fitness.height  fitness.weight
0  1.0   Cole Volk             130              60
1  NaN    Mark Reg             130              60
2  2.0  Faye Raker             130              60
data.csv
,id,name,fitness.height,fitness.weight
0,1.0,Cole Volk,130,60
1,,Mark Reg,130,60
2,2.0,Faye Raker,130,60
Notice in your original post that the call to pd.json_normalize() takes data, a list of dictionaries, not an existing dataframe as input. Read the json file to get the list of dictionaries. Use json_normalize(data) to create a dataframe that expands a dictionary into columns.

Your next post better have some code you wrote and error messages or sample input/output.

deanhystad,
thank you very much for your advice!
I do not have any code to post, yet, since I just used your few lines and it kinda worked.

I was able to upload to a SQL Server table 236 rows which were the number of records whenever running the print(df) command. The same for the number of columns: 127. That would not be right, I believe the number of columns should have been more like 250 ...

I will work on this all day tomorrow and try to come up with a sample input file. It has many levels of nesting and I see many nested objects nested in arrays ... the table is correct up to about the 10 th column where I see the first nesting, and that is where normalization seems to not work anymore. The names of the columns do not look like SomeName.SomeAttribute, etc. Strangely enough, this happens toward the end of the column in the table, but by that time all data is shifted and cannot be used.
Thank you again, please give me a little time to practice a little bit.
Regards,
elsvieta

**deanhystad** · Apr-03-2025, 09:39 PM

When printing large dataframes not all rows and columns are printed. Pandas shows you the start and end and leaves out things in the middle. If you want to know the size of the dataframe, ask the dataframe.

import pandas as pd
df = pd.DataFrame([{"A": 1, "B": 2, "C": 3}, {"A": 4, "B": 5, "C": 6}])
print(df)
print(df.shape)

Output:   A  B  C
0  1  2  3
1  4  5  6
(2, 3)

If you don't know what your database looks like, download a browser like DB Browser.

Pedroski55 · Apr-04-2025, 03:47 PM

csv files and json files are just text files, no need to get complicated!

Your data is faulty. In SQL, a unique identifier, call it id if you like, is very important.

I don't like the NaN values in the column id in pandas, you are asking for problems in SQL. SQL likes a unique, auto-incremented PRIMARY KEY. Why would you then have another PRIMARY KEY?

So forget your id key in your data, which is missing in some sets of data, and create the id yourself so that every set of data has a unique ID:

import json

# this data is a list
data = [
    {
        "id": 1,
        "name": "Cole Volk",
        "fitness": {"height": 130, "weight": 60},
    },
    {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
    {
        "id": 2,
        "name": "Faye Raker",
        "fitness": {"height": 130, "weight": 60},
    },
    {"name": "Nick Cave", "fitness": {"height": 180, "weight": 100}}
    ]

len(data)

# save data as a json
path2json = '/home/pedro/temp/myjson.json'

with open(path2json, 'w') as myjson:
    json.dump(data, myjson)

with open(path2json) as ajson:
    myjson = json.load(ajson)

# have a look at myjson.json
print(myjson)

path2csv = '/home/pedro/temp/mycsv.csv'

autoinc = 0
with open(path2csv, 'w') as csv:
    s = "id,name,height,weight\n"
    csv.write(s)
    for d in data:
        autoinc +=1
        mylist = [str(autoinc), d["name"], str(d["fitness"]["height"]), str(d["fitness"]["weight"])]
        s = ','.join(mylist) + '\n'
        csv.write(s)

mycsv.csv looks like:

Output:id,name,height,weight
1,Cole Volk,130,60
2,Mark Reg,130,60
3,Faye Raker,130,60
4,Nick Cave,180,100

Make an SQL table:

Quote:CREATE TABLE IF NOT EXISTS people (id INT AUTO_INCREMENT PRIMARY KEY NOT NULL, name VARCHAR(50) DEFAULT 'X',
height INT NOT NULL DEFAULT 5, weight INt NOT NULL DEFAULT 1) DEFAULT CHARACTER SET UTF8MB4 ENGINE=InnoDB

Now you can import the mycsv.csv into your database, skipping, of course, the first row, which is the column headers.

Thereafter, if you want to add 1 or more records, the SQL column id will auto increment:

Quote:INSERT INTO people (name, height, weight) VALUES ('Cocky Kevin', 185 , 66) ;

Additional records can be read from a csv and INSERT commands made for each record using python. Each new record will have a unique PRIMARY KEY

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Pandas and MongoDB question	Majority390	1	1,361	Dec-23-2024, 02:41 AM Last Post: sakshi009
	pandas df inside a df question	mbaker_wv	4	2,122	Dec-25-2022, 01:11 AM Last Post: mbaker_wv
	Pandas usecols question	rsearing	1	1,900	Aug-20-2022, 10:10 PM Last Post: jefsummers
	Simple pandas question	mcva	4	3,649	Dec-17-2021, 04:47 PM Last Post: mcva
	Pandas question	new2datasci	0	2,446	Jan-10-2021, 01:29 AM Last Post: new2datasci
	Pandas merge question	smw10c	2	6,476	Jul-02-2020, 06:56 PM Last Post: hussainmujtaba
	Counting Criteria in Pandas Question	Koenig	1	2,714	Sep-30-2019, 05:16 AM Last Post: perfringo
	Function question using Pandas	smw10c	7	8,601	Feb-12-2019, 06:52 PM Last Post: Nathandsn
	Simple pandas dataframe question	popohoma	1	4,398	Jan-03-2019, 05:00 PM Last Post: ashlardev
	question on pandas datareader	kit12_31	3	10,572	Feb-05-2018, 11:55 PM Last Post: snippsat

pandas.json_normalize question

User Panel Messages

Announcements