Python Forum
Dealing with duplicated data in a CSV file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Dealing with duplicated data in a CSV file
#7
If the file is already sorted on the first column, you can invoke itertools.groupby()
# untested code but you get the idea
import csv
import itertools as itt
from operator import itemgetter

def unique_rows(rows):
    for key, group in itt.groupby(rows, key=itemgetter(0)):
        yield next(group)

def main():
    with open('input.csv') as ifh, open('output.csv', 'w') as ofh:
        rd = csv.reader(ifh)
        wt = csv.writer(ofh)
        wt.writerows(unique_rows(rd))

if __name__ == '__main__':
    main()
Reply


Messages In This Thread
RE: Dealing with duplicated data in a CSV file - by Gribouillis - Sep-05-2021, 06:44 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Excel from SAP - dealing with formats and VBA MasterOfDestr 7 456 Feb-25-2024, 12:23 PM
Last Post: Pedroski55
  UnicodeEncodeError - Dealing with Japanese Characters fioranosnake 2 2,353 Jul-07-2022, 08:43 PM
Last Post: fioranosnake
  xml file creation from an XML file template and data from an excel file naji_python 1 2,070 Dec-21-2020, 03:24 PM
Last Post: Gribouillis
  Counter of the duplicated packets from a pcap file salwa17 8 4,145 Jun-26-2020, 11:31 PM
Last Post: salwa17
  How to save CSV file data into the Azure Data Lake Storage Gen2 table? Mangesh121 0 2,079 Jun-26-2020, 11:59 AM
Last Post: Mangesh121
  Dealing with a .json nightmare... ideas? t4keheart 10 4,249 Jan-28-2020, 10:12 PM
Last Post: t4keheart
  Dealing with Exponential data parthi1705 11 9,585 May-30-2019, 10:16 AM
Last Post: buran
  Dealing with multiple context managers heras 5 4,602 Nov-16-2018, 09:01 AM
Last Post: DeaD_EyE
  dealing with big data of timestamp LMQ 0 2,142 Jul-27-2018, 01:23 PM
Last Post: LMQ
  dealing with spaces in file names AceScottie 5 74,641 Jun-02-2018, 01:06 PM
Last Post: AceScottie

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020