Posts: 6
Threads: 2
Joined: Jan 2019
I'm converting the csv format to json format. Now I generated the json structure but I need to group the data by name.
import csv
result = []
with open("student.csv", "r") as csv_ledger:
for row in csv.DictReader(csv_ledger, skipinitialspace=True):
result.append({
"name": row["name"],
"email": row["email"],
"items": [{
"phone": row["phone"],
"info": {"date": row["date"]},
}],
})
Posts: 12,033
Threads: 486
Joined: Sep 2016
Posts: 6
Threads: 2
Joined: Jan 2019
Thanks for reply.
CSV file
name,email,date,phone
john,example.com,26/11/18,123
john,hello.com,12/08/18,123456
Posts: 12,033
Threads: 486
Joined: Sep 2016
if you use name for key, there will be a duplicate key conflict since two rows have john as name
Posts: 6
Threads: 2
Joined: Jan 2019
Jan-13-2019, 11:05 PM
(This post was last modified: Jan-13-2019, 11:05 PM by terrydidi.)
Thanks for reply
updated CSV file
name,email,date,phone
john,example.com,26/11/18,123
johnny,hello.com,12/08/18,123456 I'm using the defaultdict
import csv
from collections import defaultdict
result2 = defaultdict(list)
result = []
with open("student.csv", "r") as csv_ledger:
for row in csv.DictReader(csv_ledger, skipinitialspace=True):
result.append({
"name": row["name"],
"email": row["email"],
"items": [{
"phone": row["phone"],
"info": {"date": row["date"]},
}],
})
result2[row['name']].append(row['phone'])<---append whole items list
result2[row['name'],row['email']].append(row['items'])<---not support The result above is like
"john": [
"123"
]
Posts: 12,033
Threads: 486
Joined: Sep 2016
but the second csv record with name john will overwrite the first.
import csv
import os
# I need following for my virtual environment, you can remove if
# running python from same directory as csv file.
os.chdir(os.path.abspath(os.path.dirname(__file__)))
result = []
plain_dict = {}
# To get flat normal dict, can just use dict(ordered_dict)
# but to get name as key:
with open("student.csv", "r") as csv_ledger:
name = None
reader = csv.DictReader(csv_ledger)
for row in csv.DictReader(csv_ledger, skipinitialspace=True):
for k, v in row.items():
if k == 'name':
name = v
plain_dict[name] = {}
else:
plain_dict[name][k] = v
print(plain_dict) and since there are two csv rows with sane name, second overwrites first.
result:
{'john': {'email': 'hello.com', 'date': '12/08/18', 'phone': '123456'}}
but can be saved with json.dump
code again with json dump and reread json:
import csv
import json
import os
# I need following for my virtual environment, you can remove if
# running python from same directory as csv file.
os.chdir(os.path.abspath(os.path.dirname(__file__)))
result = []
plain_dict = {}
# To get flat normal dict, can just use dict(ordered_dict)
# but to get name as key:
with open("student.csv", "r") as csv_ledger:
name = None
reader = csv.DictReader(csv_ledger)
for row in csv.DictReader(csv_ledger, skipinitialspace=True):
for k, v in row.items():
if k == 'name':
name = v
plain_dict[name] = {}
else:
plain_dict[name][k] = v
print(f'plain_dict: {plain_dict}')
with open('student.json', 'w') as jp:
json.dump(plain_dict, jp)
# reload json to make sure ok
with open('student.json') as jp:
new_dict = json.load(jp)
print(f'new_dict: {new_dict}') output:
Output: plain_dict: {'john': {'email': 'hello.com', 'date': '12/08/18', 'phone': '123456'}}
new_dict: {'john': {'email': 'hello.com', 'date': '12/08/18', 'phone': '123456'}
Posts: 6
Threads: 2
Joined: Jan 2019
Jan-14-2019, 12:29 AM
(This post was last modified: Jan-14-2019, 12:30 AM by terrydidi.)
Thanks for the reply.
My expected result is like
{
"name": "john",
"email": "example.com",
"items": [
{
"phone": "123",
"info": {
"date": "26/11/18",
}
},
{
"phone": "123456",
"info": {
"date": "12/08/18",
}
},
]
} The other value will insert according to the name. I will try to refer your method.
Btw, based on my first code will not group yet, is it possible to transform the top-level array [] to object {}?Or just remove the top-level [].
Posts: 817
Threads: 1
Joined: Mar 2018
If this isn't an assignment, I would suggest to use Pandas. Pandas has a lot of useful tool
to handle such problems, look at io submodule for i/o operations and groupby.
If you used Pandas, your code would be significantly shorter.
Posts: 1,950
Threads: 8
Joined: Jun 2018
According to initial data sample there could be same name and different mail addresses:
Output: name,email,date,phone
john,example.com,26/11/18,123
john,hello.com,12/08/18,123456
If this is actual case then you should build data structure which supports multiple e-mail addresses under one name.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
|