Python Forum
Thread Rating:
  • 3 Vote(s) - 2.33 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Grouping csv by name
#1
I'm converting the csv format to json format. Now I generated the json structure but I need to group the data by name.

import csv

result = []
with open("student.csv", "r") as csv_ledger:
    for row in csv.DictReader(csv_ledger, skipinitialspace=True):
        result.append({
            "name": row["name"],
            "email": row["email"],
            "items": [{
                "phone": row["phone"],
                "info": {"date": row["date"]},
            }],
        })
Reply
#2
please post csv file
Reply
#3
Thanks for reply.

CSV file
name,email,date,phone
john,example.com,26/11/18,123
john,hello.com,12/08/18,123456
Reply
#4
if you use name for key, there will be a duplicate key conflict since two rows have john as name
Reply
#5
Thanks for reply

updated CSV file
name,email,date,phone
john,example.com,26/11/18,123
johnny,hello.com,12/08/18,123456
I'm using the defaultdict
import csv
from collections import defaultdict

result2 = defaultdict(list)
result = []
with open("student.csv", "r") as csv_ledger:
    for row in csv.DictReader(csv_ledger, skipinitialspace=True):
        result.append({
            "name": row["name"],
            "email": row["email"],
            "items": [{
                "phone": row["phone"],
                "info": {"date": row["date"]},
            }],
        })
     result2[row['name']].append(row['phone'])<---append whole items list
     result2[row['name'],row['email']].append(row['items'])<---not support
The result above is like

"john": [
        "123"
    ]
Reply
#6
but the second csv record with name john will overwrite the first.
import csv
import os
 
# I need following for my virtual environment, you can remove if
# running python from same directory as csv file.
os.chdir(os.path.abspath(os.path.dirname(__file__)))

result = []

plain_dict = {}
# To get flat normal dict, can just use dict(ordered_dict)
# but to get name as key:
with open("student.csv", "r") as csv_ledger:
    name = None
    reader = csv.DictReader(csv_ledger)
    for row in csv.DictReader(csv_ledger, skipinitialspace=True):
        for k, v in row.items():
            if k == 'name':
                name = v
                plain_dict[name] = {}
            else:
                plain_dict[name][k] = v
    print(plain_dict)
and since there are two csv rows with sane name, second overwrites first.
result:
{'john': {'email': 'hello.com', 'date': '12/08/18', 'phone': '123456'}}
but can be saved with json.dump

code again with json dump and reread json:
import csv
import json
import os
 
# I need following for my virtual environment, you can remove if
# running python from same directory as csv file.
os.chdir(os.path.abspath(os.path.dirname(__file__)))

result = []

plain_dict = {}
# To get flat normal dict, can just use dict(ordered_dict)
# but to get name as key:
with open("student.csv", "r") as csv_ledger:
    name = None
    reader = csv.DictReader(csv_ledger)
    for row in csv.DictReader(csv_ledger, skipinitialspace=True):
        for k, v in row.items():
            if k == 'name':
                name = v
                plain_dict[name] = {}
            else:
                plain_dict[name][k] = v

    print(f'plain_dict: {plain_dict}')
    with open('student.json', 'w') as jp:
        json.dump(plain_dict, jp)
    
    # reload json to make sure ok
    with open('student.json') as jp:
        new_dict = json.load(jp)
    print(f'new_dict: {new_dict}')
output:
Output:
plain_dict: {'john': {'email': 'hello.com', 'date': '12/08/18', 'phone': '123456'}} new_dict: {'john': {'email': 'hello.com', 'date': '12/08/18', 'phone': '123456'}
Reply
#7
Thanks for the reply.

My expected result is like
  {
    "name": "john",
    "email": "example.com",
    "items": [
      {
        "phone": "123",
        "info": {
             "date": "26/11/18",
        }
      },
      {
        "phone": "123456",
        "info": {
             "date": "12/08/18",
        }
      },
    ]
  }
The other value will insert according to the name. I will try to refer your method.
Btw, based on my first code will not group yet, is it possible to transform the top-level array [] to object {}?Or just remove the top-level [].
Reply
#8
If this isn't an assignment, I would suggest to use Pandas. Pandas has a lot of useful tool
to handle such problems, look at io submodule for i/o operations and groupby.
If you used Pandas, your code would be significantly shorter.
Reply
#9
According to initial data sample there could be same name and different mail addresses:

Output:
name,email,date,phone john,example.com,26/11/18,123 john,hello.com,12/08/18,123456
If this is actual case then you should build data structure which supports multiple e-mail addresses under one name.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Grouping Data based on 30% bracket purnima1 4 1,142 Mar-10-2023, 07:38 PM
Last Post: deanhystad
  Grouping and sum of a list of objects Otbredbaron 1 3,129 Oct-23-2021, 01:42 PM
Last Post: Gribouillis
  Grouping and summing of dataset jef 0 1,608 Oct-04-2020, 11:03 PM
Last Post: jef
  Grouping algorithm riccardoob 7 2,927 May-19-2020, 01:22 PM
Last Post: deanhystad
  Help Grouping by Intervals on list paul41 1 2,071 Dec-03-2019, 09:43 PM
Last Post: michael1789
  Grouping a list of various time into intervals paul41 1 3,730 Nov-24-2019, 01:47 PM
Last Post: buran
  resample grouping pr0blem olufemig 1 1,924 Nov-06-2019, 10:45 PM
Last Post: Larz60+
  Splitting lines ang grouping three at once samsonite 5 2,717 Jun-21-2019, 05:19 PM
Last Post: ichabod801
  Function for grouping variables Scott 1 2,673 Nov-13-2018, 03:01 AM
Last Post: ichabod801
  Python grouping program GhostZero199 2 3,272 Jul-18-2017, 12:44 PM
Last Post: sparkz_alot

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020