Python Forum
Combine Duplicate CVS Column Values
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Combine Duplicate CVS Column Values
#1
Hi All,
This is my first cry for help Big Grin

I'm a Python Noob and I have been progressing fairly well (I started learning Python about a month ago). I have come to the point in my code that I simply cannot figure out. I am trying to take a simple .csv file and tally duplicates and combine values. I have tried dictionaries, tuples, sets, etc etc but I am simply missing something.

My production .csv is about 150k lines long and is a log for commodities market state change messages... so I created a basic example of what I am dealing with. I have pretty good luck with regex and could probably solve this problem via regex.. but I know that is slow.

Here is my example .csv... lets say I'm a car dealer that owns several car lots.. so I have duplicates in the list.

ford, 3
ford, 3
honda, 1
dodge, 8
dodge, 2
dodge, 5
ford, 6
dodge, 2
chevy, 1
dodge, 5

My goal is to have a combined list that would look like this... the order in the result does not matter.. I just need totals

chevy, 1
dodge, 22
ford, 9
honda, 1


So far I have coded the following that parses the .csv and creates a list of car makes, and another list that has the car make and quantity.


import csv

log = ('C:\\Users\\' + 'jcastleb' + '\\Desktop\\' + 'cars.csv')


f = open(log)
csv_f = csv.reader(f)

#Create empty lists
carlist = []
car_count = []
result = []

for row in csv_f:
    #Create list of Car Makes and Numbers
    car_count.append(row)

    #Create the list of cars
    if row[0] not in carlist:
        carlist.append(row[0])

    if row[0] not in result:
        result.append(row[0])

print(carlist)
print(car_count)
I get the following when I run the script:

Output:
['ford', 'honda', 'dodge', 'chevy'] [['ford', ' 3'], ['ford', ' 3'], ['honda', ' 1'], ['dodge', ' 8'], ['dodge', ' 2'], ['dodge', ' 5'], ['ford', ' 6'], ['dodge', ' 2'], ['chevy', ' 1'], ['dodge', ' 5']]


If anyone can advise on how I continue the code to get the total I would be greatly appreciative!  If there is a more efficient method then mine please do let me know. Big Grin

I have a hunch that a dictionary may be the way to go but I could not figure out how to append the key values Sad

Thanks all!
Reply
#2
>>> cars={'ford':3,'chevy':2}
>>> 'ford' in cars
True
>>> 'dodge' in cars
False
>>> cars['ford']+=3
>>> cars['dodge']=5
>>> cars
{'chevy': 2, 'dodge': 5, 'ford': 6}
You can make it even simpler using  a defaultdict. A defaultdict creates the item for you with a default value
when you reference it:
>>>import collections
# The dict will set a default value by calling a function
# so we give it an anonymous function that always returns 0 
>>> cars=collections.defaultdict(lambda:0)
# checking what happens when we reference something we haven't got yet
>>> cars['ford']
0
>>> cars['ford']+=3
# We can blindly add stuff, "dodge' isn't in the dict yet
>>> cars['dodge']+=8
>>> cars['dodge']+=3
>>> cars
defaultdict(<function <lambda> at 0x7f0131d4d050>, {'dodge': 11, 'ford': 3})
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply
#3
Thank you Ofnuts! Your solution worked great. Big Grin Dance

This is the final code using the method you suggested.

import collections
import csv
from timeit import default_timer as timer
start = timer()
#Set up log path and csv reader
log = ('C:\\Users\\' + 'jcastleb' + '\\Desktop\\' + 'endexlog.csv')
f = open(log)
csv_f = csv.reader(f)
#Create default dictionary 
counts = collections.defaultdict(lambda:0)
#Create empty lists
market_types = []
market_msg = []
strip_counts = []
#Function to update Market mnessage list
def market_message_parser():
        for row in csv_f:
            market_msg.append(row)
#Function to update dictionary with market names
def dictionary_names():
    for row in market_msg:
        if row[2] not in market_types:
            counts[(row[2])]
 
#Function to pull number of strips per market and update strip counts
def strip_parser():
    for row in market_msg:
        pull = (row[2], row[5])
        strip_counts.append(pull)
        
#Run List functions
market_message_parser()
dictionary_names()
strip_parser()
#update dictionary with counts
for row in strip_counts:
    name = row[0]
    number = int(row[1])
    counts[name] += number
print('Final Dictionary calculated:')
print()
print(counts)
print()
print()
print('Time to calculate')
end = timer()
print(end - start)
Reply
#4
Think it look better to set list in defaultdict.
The sum that list up.
Eg.
from collections import defaultdict

data = '''\
ford, 3
ford, 3
honda, 1
dodge, 8
dodge, 2
dodge, 5
ford, 6
dodge, 2
chevy, 1
dodge, 5'''

data = [i.split(', ') for i in data.split('\n')]
d = defaultdict(list)
[d[k].append(int(v)) for k, v in data]
print({k: sum(v) for (k, v) in d.items()})
Output:
{'dodge': 22, 'honda': 1, 'ford': 12, 'chevy': 1}
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Get an average of the unique values of a column with group by condition and assign it klllmmm 0 289 Feb-17-2024, 05:53 PM
Last Post: klllmmm
  Converting column of values into muliple columns of counts highland44 0 260 Feb-01-2024, 12:48 AM
Last Post: highland44
  PowerBI: Using Python Regex to look for values in column MarcusR44 1 983 Oct-14-2022, 01:03 PM
Last Post: ibreeden
  How to combine multiple column values into 1? cubangt 15 2,872 Aug-11-2022, 08:25 PM
Last Post: cubangt
  Average values on duplicate records chandramouliarun 1 837 Jul-27-2022, 03:13 AM
Last Post: deanhystad
  Creating a numpy array from specific values of a spreadsheet column JulianZ 0 1,135 Apr-19-2022, 07:36 AM
Last Post: JulianZ
  How to split file by same values from column from imported CSV file? Paqqno 5 2,808 Mar-24-2022, 05:25 PM
Last Post: Paqqno
  Float Slider - Affecting Values in Column 'Pandas' planckepoch86 0 1,405 Jan-22-2022, 02:18 PM
Last Post: planckepoch86
  Pandas DataFrame combine rows by column value, where Date Rows are NULL rhat398 0 2,125 May-04-2021, 10:51 PM
Last Post: rhat398
  How to get value in Dataframe given row & column values? moonlight 1 2,437 Apr-26-2021, 09:30 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020