Grouping csv by name - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Grouping csv by name (/thread-15341.html) |
Grouping csv by name - terrydidi - Jan-13-2019 I'm converting the csv format to json format. Now I generated the json structure but I need to group the data by name. import csv result = [] with open("student.csv", "r") as csv_ledger: for row in csv.DictReader(csv_ledger, skipinitialspace=True): result.append({ "name": row["name"], "email": row["email"], "items": [{ "phone": row["phone"], "info": {"date": row["date"]}, }], }) RE: Grouping csv by name - Larz60+ - Jan-13-2019 please post csv file RE: Grouping csv by name - terrydidi - Jan-13-2019 Thanks for reply. CSV file name,email,date,phone john,example.com,26/11/18,123 john,hello.com,12/08/18,123456 RE: Grouping csv by name - Larz60+ - Jan-13-2019 if you use name for key, there will be a duplicate key conflict since two rows have john as name RE: Grouping csv by name - terrydidi - Jan-13-2019 Thanks for reply updated CSV file name,email,date,phone john,example.com,26/11/18,123 johnny,hello.com,12/08/18,123456I'm using the defaultdict import csv from collections import defaultdict result2 = defaultdict(list) result = [] with open("student.csv", "r") as csv_ledger: for row in csv.DictReader(csv_ledger, skipinitialspace=True): result.append({ "name": row["name"], "email": row["email"], "items": [{ "phone": row["phone"], "info": {"date": row["date"]}, }], }) result2[row['name']].append(row['phone'])<---append whole items list result2[row['name'],row['email']].append(row['items'])<---not supportThe result above is like "john": [ "123" ] RE: Grouping csv by name - Larz60+ - Jan-14-2019 but the second csv record with name john will overwrite the first. import csv import os # I need following for my virtual environment, you can remove if # running python from same directory as csv file. os.chdir(os.path.abspath(os.path.dirname(__file__))) result = [] plain_dict = {} # To get flat normal dict, can just use dict(ordered_dict) # but to get name as key: with open("student.csv", "r") as csv_ledger: name = None reader = csv.DictReader(csv_ledger) for row in csv.DictReader(csv_ledger, skipinitialspace=True): for k, v in row.items(): if k == 'name': name = v plain_dict[name] = {} else: plain_dict[name][k] = v print(plain_dict)and since there are two csv rows with sane name, second overwrites first. result: {'john': {'email': 'hello.com', 'date': '12/08/18', 'phone': '123456'}} but can be saved with json.dump code again with json dump and reread json: import csv import json import os # I need following for my virtual environment, you can remove if # running python from same directory as csv file. os.chdir(os.path.abspath(os.path.dirname(__file__))) result = [] plain_dict = {} # To get flat normal dict, can just use dict(ordered_dict) # but to get name as key: with open("student.csv", "r") as csv_ledger: name = None reader = csv.DictReader(csv_ledger) for row in csv.DictReader(csv_ledger, skipinitialspace=True): for k, v in row.items(): if k == 'name': name = v plain_dict[name] = {} else: plain_dict[name][k] = v print(f'plain_dict: {plain_dict}') with open('student.json', 'w') as jp: json.dump(plain_dict, jp) # reload json to make sure ok with open('student.json') as jp: new_dict = json.load(jp) print(f'new_dict: {new_dict}')output:
RE: Grouping csv by name - terrydidi - Jan-14-2019 Thanks for the reply. My expected result is like { "name": "john", "email": "example.com", "items": [ { "phone": "123", "info": { "date": "26/11/18", } }, { "phone": "123456", "info": { "date": "12/08/18", } }, ] }The other value will insert according to the name. I will try to refer your method. Btw, based on my first code will not group yet, is it possible to transform the top-level array [] to object {}?Or just remove the top-level []. RE: Grouping csv by name - scidam - Jan-14-2019 If this isn't an assignment, I would suggest to use Pandas. Pandas has a lot of useful tool to handle such problems, look at io submodule for i/o operations and groupby. If you used Pandas, your code would be significantly shorter. RE: Grouping csv by name - perfringo - Jan-14-2019 According to initial data sample there could be same name and different mail addresses: If this is actual case then you should build data structure which supports multiple e-mail addresses under one name.
|