Posts: 2
Threads: 2
Joined: Sep 2020
Hi All,
We have one requirement to convert Json to CSV file. In json file, values are there for all the columns, some columns are missing but when data populated in CSV file for those values missing columns should have blank/null, next column values should not overwrite.
Json Sample data:
{"results":[{"fruit": "Apple","size": "Large","color": "Red"},{"fruit": "Banana","color": "Yellow"},{"fruit": "Watermelon","size": "Large"},{"fruit": "Orange","color": "Orange"}]}
Expected format:
Fruit,Size,Color
Apple,Large,Red
Banana,,Yellow
Watermelon,Large,
Orange,,Orange
Require your help in script preparation. Whether we can handle all the columns/Headings while reading data from json file or do we need to handle all the columns/headings while populating data in csv file...Please advise
Thanks in advance
Posts: 1,032
Threads: 16
Joined: Dec 2016
Show what you tried and what error message you got.
Posts: 8,160
Threads: 160
Joined: Sep 2016
Do you know all fields in advance or do you need to handle them dynamically?
Posts: 6,798
Threads: 20
Joined: Feb 2020
Apr-25-2023, 07:43 PM
(This post was last modified: Apr-25-2023, 07:43 PM by deanhystad.)
Can you use pandas? Using pandas this is a few lines of code.
I don't understand this:
Output: Whether we can handle all the columns/Headings while reading data from json file
You don't read a json file, you load it. The json.load(file) command reads the entire json and returns Python objects. In this case it returns a dictionary containing a list of dictionaries. You'll have to scan through the list to find all the keys. The keys will be the column headings for your csv file. Then you'll have to unpack the dictionaries into some kind of table, or maybe just write directly to the csv file. Either way you'll have to provide a filler when a dictionary does not have all the heading keys.
Posts: 2,125
Threads: 11
Joined: May 2017
Apr-26-2023, 07:39 AM
(This post was last modified: Apr-26-2023, 07:39 AM by DeaD_EyE.)
You should use first csv.
import csv
import io
results = {
"results": [
{"fruit": "Apple", "size": "Large", "color": "Red"},
{"fruit": "Banana", "color": "Yellow"},
{"fruit": "Watermelon", "size": "Large"},
{"fruit": "Orange", "color": "Orange"},
]
}
fields = tuple(results["results"][0])
with io.StringIO() as fake_file:
writer = csv.DictWriter(fake_file, fieldnames=fields)
writer.writeheader()
for row in results["results"]:
writer.writerow(row)
print(fake_file.getvalue()) The shorter version uses the method writerows, where the for-loop is inside the csv-module:
import csv
import io
results = {
"results": [
{"fruit": "Apple", "size": "Large", "color": "Red"},
{"fruit": "Banana", "color": "Yellow"},
{"fruit": "Watermelon", "size": "Large"},
{"fruit": "Orange", "color": "Orange"},
]
}
fields = tuple(results["results"][0])
with io.StringIO() as fake_file:
writer = csv.DictWriter(fake_file, fieldnames=fields)
writer.writeheader()
writer.writerows(results["results"])
print(fake_file.getvalue()) Output:
Output: fruit,size,color
Apple,Large,Red
Banana,,Yellow
Watermelon,Large,
Orange,,Orange
Posts: 8,160
Threads: 160
Joined: Sep 2016
Apr-26-2023, 08:31 AM
(This post was last modified: Apr-26-2023, 08:31 AM by buran.)
(Apr-26-2023, 07:39 AM)DeaD_EyE Wrote: fields = tuple(results["results"][0]) Note, it could be just coincidence that first element has all possible keys. That's why I asked if OP knows full list of field names in advance. Otherwise they need to loop over all dicts and combine keys. In this case it's also important to clarify if the order of the field names in the csv matters
Posts: 2,125
Threads: 11
Joined: May 2017
Apr-26-2023, 02:37 PM
(This post was last modified: Apr-26-2023, 02:37 PM by DeaD_EyE.)
If each data point has different keys, you can collect them:
#modified example data
results = {
"results": [
{"fruit": "Apple", "size": "Large", "color": "Red"},
{"fruit": "Banana", "color": "Yellow"},
{"fruit": "Watermelon", "size": "Large", "humidity": 0.9},
{"fruit": "Orange", "color": "Orange", "diameter": 33.5},
]
}
# getting all keys from all results and keeping the insertion order
fields = tuple(dict.fromkeys([key for row in results["results"] for key in row])) The nested list comprehension as nested for-loops:
fields = {}
for row in results["results"]:
for key in row:
print(f"Adding {key} to fields.")
fields[key] = None
print("Next row", end="\n\n")
print()
print("Fields before it's converted to a list or tuple")
print(fields)
print()
print("After conversion to a tuple")
fields = tuple(fields)
print(fields) If you don't care about order, you could use a set instead. If you want to sort the fields, it's also possible.
sorted_fields = sorted(set(key for result in results["results"] for key in result.keys())) Or as a for-loop:
fields = set()
for result in results["results"]:
for key in result:
fields.add(key)
fields = sorted(fields)
Posts: 6,798
Threads: 20
Joined: Feb 2020
Though this is not posted in the Homework Forum, it is obviously a homework question.
Posts: 2,125
Threads: 11
Joined: May 2017
Oh... the good part is, that it's not the complete solution.
Reading and writing is not included.
|