Python Forum
Dealing with duplicated data in a CSV file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Dealing with duplicated data in a CSV file
#4
I am new to Python but I have been programming for half a century. The following uses techniques I might have used in COBOL but using Python of course. My data is a bit different from what you specify but I hope it is close enough. When you ask questions if you can provide a small amount of sample data as in the following then that helps those that want to help you. I am using:

key,f1,f2
1,data1,data2
2,data3,data4
3,data5,data6
4,data7,data8
5,data9,data10
5,data11,data12
6,data13,data14
7,data15,data16
8,data17,data18
8,data19,data20
9,data21,data22
9,data23,data24
10,data25,data26

The following:

import sys
import csv
with open('WithDuplicates.csv', newline='') as csvfile:
    dupsreader = csv.reader(csvfile, delimiter=',')
    datalist = list(dupsreader)
    n = len(list(datalist))
    print(f"{n} records")
    if n < 3:
        print("Not enough data")
        sys.exit()
    x = 2
    while x < n:
        if datalist[x][0] == datalist[x-1][0]:   # duplicate?
            print(f"{datalist[x-1][0]},{datalist[x-1][1]},{datalist[x-1][2]},{datalist[x][1]},{datalist[x][2]}")
            x = x + 2    # skip duplicate
        else:
            print(f"{datalist[x-1][0]},{datalist[x-1][1]},{datalist[x-1][2]}")
            x = x + 1
Produces:

14 records
1,data1,data2
2,data3,data4
3,data5,data6
4,data7,data8
5,data9,data10,data11,data12
6,data13,data14
7,data15,data16
8,data17,data18,data19,data20
9,data21,data22,data23,data24

That does not write the data out as a CSV but I hope that is close enough. I am not sure of exactly what you need as output. Note that that data has a header. Also note that that is (I believe) reading the entire file into memory. That should be okay (computers have ample resources compared to half a century ago) unless it is a really big file.
Reply


Messages In This Thread
RE: Dealing with duplicated data in a CSV file - by SamHobbs - Sep-05-2021, 01:01 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Excel from SAP - dealing with formats and VBA MasterOfDestr 7 551 Feb-25-2024, 12:23 PM
Last Post: Pedroski55
  UnicodeEncodeError - Dealing with Japanese Characters fioranosnake 2 2,444 Jul-07-2022, 08:43 PM
Last Post: fioranosnake
  xml file creation from an XML file template and data from an excel file naji_python 1 2,102 Dec-21-2020, 03:24 PM
Last Post: Gribouillis
  Counter of the duplicated packets from a pcap file salwa17 8 4,232 Jun-26-2020, 11:31 PM
Last Post: salwa17
  How to save CSV file data into the Azure Data Lake Storage Gen2 table? Mangesh121 0 2,107 Jun-26-2020, 11:59 AM
Last Post: Mangesh121
  Dealing with a .json nightmare... ideas? t4keheart 10 4,380 Jan-28-2020, 10:12 PM
Last Post: t4keheart
  Dealing with Exponential data parthi1705 11 9,761 May-30-2019, 10:16 AM
Last Post: buran
  Dealing with multiple context managers heras 5 4,690 Nov-16-2018, 09:01 AM
Last Post: DeaD_EyE
  dealing with big data of timestamp LMQ 0 2,167 Jul-27-2018, 01:23 PM
Last Post: LMQ
  dealing with spaces in file names AceScottie 5 75,102 Jun-02-2018, 01:06 PM
Last Post: AceScottie

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020