Posts: 3
Threads: 1
Joined: Apr 2025
Apr-02-2025, 05:49 PM
(This post was last modified: Apr-02-2025, 08:28 PM by deanhystad.)
hi all,
I just wrote my hello world program in Python and I need some guidance for reading a somewhat complex json file for uploading it to a Sql Server table.
I installed pandas and I see it has a input/output command pandas.json_normalize()
I see this example:
Output: >>> data = [
... {
... "id": 1,
... "name": "Cole Volk",
... "fitness": {"height": 130, "weight": 60},
... },
... {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
... {
... "id": 2,
... "name": "Faye Raker",
... "fitness": {"height": 130, "weight": 60},
... },
... ]
>>> pd.json_normalize(data, max_level=1)
id name fitness.height fitness.weight
0 1.0 Cole Volk 130 60
1 NaN Mark Reg 130 60
2 2.0 Faye Raker 130 60
which is exactly what I need as an output.
Basically, I start with this multiple level nested json and I build this file where various levels become level1.level2.level3... columns in a table and the values are in rows.
Please help me take the first step: how do I get the data argument from my initial json file myFile.json?
pandas.json_normalize( data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)
Thanks,
elsvieta
Posts: 3
Threads: 1
Joined: Apr 2025
ok,
I figured out the following:
1. need to start with import pandas as pd
2. need to create a data frame object df = pd.read_json(r"C:\path\to\file\input.json")
3. then I would create new csv file with df.to_csv('C:\path\to\file\output.csv')
The output.csv was not very different than the initial file. I need to see how the normalize command operates on the data frame.
I'm still waiting for any advice,
Thanks,
Regards,
elsvieta
Posts: 6,779
Threads: 20
Joined: Feb 2020
Apr-02-2025, 09:01 PM
(This post was last modified: Apr-02-2025, 09:01 PM by deanhystad.)
You do not build a json file. You create a python object and use json.dump(obj, file) to create a json file.
This is what a the json file looks like for the list in your post.
[{"id": 1, "name": "Cole Volk", "fitness": {"height": 130, "weight": 60}}, {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}}, {"id": 2, "name": "Faye Raker", "fitness": {"height": 130, "weight": 60}}]
I created it like this:
import json
# This is not json. This is a list of dictionaries.
data = [
{
"id": 1,
"name": "Cole Volk",
"fitness": {"height": 130, "weight": 60},
},
{"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
{
"id": 2,
"name": "Faye Raker",
"fitness": {"height": 130, "weight": 60},
},
]
with open("test.json", "w") as file:
data = json.dump(data, file) # This writes data to file using json syntax. If you had some records from a database that you want to put in a json file:
1 Retrieve the records from the database.
2 Put the records in a list.
3 Open a file for writing.
4 Dump the list to the file.
You can use indent in the dump command to make the file prettier, but the purpose of json is portable storage and retrieval. json is not a human friendly presentation format.
This is a program that reads the json file and prints the list.
import json
with open("test.json", "r") as file:
data = json.load(file)
print(data) Output: [{'id': 1, 'name': 'Cole Volk', 'fitness': {'height': 130, 'weight': 60}}, {'name': 'Mark Reg', 'fitness': {'height': 130, 'weight': 60}}, {'id': 2, 'name': 'Faye Raker', 'fitness': {'height': 130, 'weight': 60}}]
To make it pretty I can use json.
import json
with open("test.json", "r") as file:
data = json.load(file)
print(json.dumps(data, indent=4)) Output: [
{
"id": 1,
"name": "Cole Volk",
"fitness": {
"height": 130,
"weight": 60
}
},
{
"name": "Mark Reg",
"fitness": {
"height": 130,
"weight": 60
}
},
{
"id": 2,
"name": "Faye Raker",
"fitness": {
"height": 130,
"weight": 60
}
}
]
json.dumps(obj) is like json.dump(obj, file) except it produces a string instead of writing to a file. There is a corresponding json.loads(string) that is similar to json.load(file).
Posts: 6,779
Threads: 20
Joined: Feb 2020
Apr-02-2025, 09:40 PM
(This post was last modified: Apr-02-2025, 09:41 PM by deanhystad.)
Sounds like you know all you need to proceed. Time for you to start writing some code. Trial and error is important at the early stages of learning.
I wrote this to see what happens when I use pandas to load the json file I created earlier.
import pandas as pd
df = pd.read_json("data.json")
print(df) Output: id name fitness
0 1.0 Cole Volk {'height': 130, 'weight': 60}
1 NaN Mark Reg {'height': 130, 'weight': 60}
2 2.0 Faye Raker {'height': 130, 'weight': 60}
Notice that fitness is a dictionary, not columns. If I try to write the pandas dataframe to a csv format file:
import pandas as pd
df = pd.read_json("data.json")
df.to_csv("data.csv") I get this in data.csv
Output: ,id,name,fitness
0,1.0,Cole Volk,"{'height': 130, 'weight': 60}"
1,,Mark Reg,"{'height': 130, 'weight': 60}"
2,2.0,Faye Raker,"{'height': 130, 'weight': 60}"
Not really a csv format file. That is why the dataframe had to be normalized.
import json
import pandas as pd
with open("data.json", "r") as file:
data = json.load(file)
df = pd.json_normalize(data, max_level=1)
print(df)
df.to_csv("data.csv") output
Output: id name fitness.height fitness.weight
0 1.0 Cole Volk 130 60
1 NaN Mark Reg 130 60
2 2.0 Faye Raker 130 60
data.csv
,id,name,fitness.height,fitness.weight
0,1.0,Cole Volk,130,60
1,,Mark Reg,130,60
2,2.0,Faye Raker,130,60 Notice in your original post that the call to pd.json_normalize() takes data, a list of dictionaries, not an existing dataframe as input. Read the json file to get the list of dictionaries. Use json_normalize(data) to create a dataframe that expands a dictionary into columns.
Your next post better have some code you wrote and error messages or sample input/output.
Posts: 3
Threads: 1
Joined: Apr 2025
(Apr-02-2025, 09:40 PM)deanhystad Wrote: Sounds like you know all you need to proceed. Time for you to start writing some code. Trial and error is important at the early stages of learning.
I wrote this to see what happens when I use pandas to load the json file I created earlier.
import pandas as pd
df = pd.read_json("data.json")
print(df) Output: id name fitness
0 1.0 Cole Volk {'height': 130, 'weight': 60}
1 NaN Mark Reg {'height': 130, 'weight': 60}
2 2.0 Faye Raker {'height': 130, 'weight': 60}
Notice that fitness is a dictionary, not columns. If I try to write the pandas dataframe to a csv format file:
import pandas as pd
df = pd.read_json("data.json")
df.to_csv("data.csv") I get this in data.csv
Output: ,id,name,fitness
0,1.0,Cole Volk,"{'height': 130, 'weight': 60}"
1,,Mark Reg,"{'height': 130, 'weight': 60}"
2,2.0,Faye Raker,"{'height': 130, 'weight': 60}"
Not really a csv format file. That is why the dataframe had to be normalized.
import json
import pandas as pd
with open("data.json", "r") as file:
data = json.load(file)
df = pd.json_normalize(data, max_level=1)
print(df)
df.to_csv("data.csv") output
Output: id name fitness.height fitness.weight
0 1.0 Cole Volk 130 60
1 NaN Mark Reg 130 60
2 2.0 Faye Raker 130 60
data.csv
,id,name,fitness.height,fitness.weight
0,1.0,Cole Volk,130,60
1,,Mark Reg,130,60
2,2.0,Faye Raker,130,60 Notice in your original post that the call to pd.json_normalize() takes data, a list of dictionaries, not an existing dataframe as input. Read the json file to get the list of dictionaries. Use json_normalize(data) to create a dataframe that expands a dictionary into columns.
Your next post better have some code you wrote and error messages or sample input/output.
deanhystad,
thank you very much for your advice!
I do not have any code to post, yet, since I just used your few lines and it kinda worked.
I was able to upload to a SQL Server table 236 rows which were the number of records whenever running the print(df) command. The same for the number of columns: 127. That would not be right, I believe the number of columns should have been more like 250 ...
I will work on this all day tomorrow and try to come up with a sample input file. It has many levels of nesting and I see many nested objects nested in arrays ... the table is correct up to about the 10 th column where I see the first nesting, and that is where normalization seems to not work anymore. The names of the columns do not look like SomeName.SomeAttribute, etc. Strangely enough, this happens toward the end of the column in the table, but by that time all data is shifted and cannot be used.
Thank you again, please give me a little time to practice a little bit.
Regards,
elsvieta
Posts: 6,779
Threads: 20
Joined: Feb 2020
When printing large dataframes not all rows and columns are printed. Pandas shows you the start and end and leaves out things in the middle. If you want to know the size of the dataframe, ask the dataframe.
import pandas as pd
df = pd.DataFrame([{"A": 1, "B": 2, "C": 3}, {"A": 4, "B": 5, "C": 6}])
print(df)
print(df.shape) Output: A B C
0 1 2 3
1 4 5 6
(2, 3)
If you don't know what your database looks like, download a browser like DB Browser.
Posts: 1,090
Threads: 143
Joined: Jul 2017
csv files and json files are just text files, no need to get complicated!
Your data is faulty. In SQL, a unique identifier, call it id if you like, is very important.
I don't like the NaN values in the column id in pandas, you are asking for problems in SQL. SQL likes a unique, auto-incremented PRIMARY KEY. Why would you then have another PRIMARY KEY?
So forget your id key in your data, which is missing in some sets of data, and create the id yourself so that every set of data has a unique ID:
import json
# this data is a list
data = [
{
"id": 1,
"name": "Cole Volk",
"fitness": {"height": 130, "weight": 60},
},
{"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
{
"id": 2,
"name": "Faye Raker",
"fitness": {"height": 130, "weight": 60},
},
{"name": "Nick Cave", "fitness": {"height": 180, "weight": 100}}
]
len(data)
# save data as a json
path2json = '/home/pedro/temp/myjson.json'
with open(path2json, 'w') as myjson:
json.dump(data, myjson)
with open(path2json) as ajson:
myjson = json.load(ajson)
# have a look at myjson.json
print(myjson)
path2csv = '/home/pedro/temp/mycsv.csv'
autoinc = 0
with open(path2csv, 'w') as csv:
s = "id,name,height,weight\n"
csv.write(s)
for d in data:
autoinc +=1
mylist = [str(autoinc), d["name"], str(d["fitness"]["height"]), str(d["fitness"]["weight"])]
s = ','.join(mylist) + '\n'
csv.write(s) mycsv.csv looks like:
Output: id,name,height,weight
1,Cole Volk,130,60
2,Mark Reg,130,60
3,Faye Raker,130,60
4,Nick Cave,180,100
Make an SQL table:
Quote:CREATE TABLE IF NOT EXISTS people (id INT AUTO_INCREMENT PRIMARY KEY NOT NULL, name VARCHAR(50) DEFAULT 'X',
height INT NOT NULL DEFAULT 5, weight INt NOT NULL DEFAULT 1) DEFAULT CHARACTER SET UTF8MB4 ENGINE=InnoDB
Now you can import the mycsv.csv into your database, skipping, of course, the first row, which is the column headers.
Thereafter, if you want to add 1 or more records, the SQL column id will auto increment:
Quote:INSERT INTO people (name, height, weight) VALUES ('Cocky Kevin', 185 , 66) ;
Additional records can be read from a csv and INSERT commands made for each record using python. Each new record will have a unique PRIMARY KEY
|