Python Forum

Full Version: How to remove duplicates basis keys of a csv file
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
hi,

i have a csv while has data structure like this

key bhk Area Property_Type
310935 2 BHK 47.32 APARTMENT
310935 2 BHK 47.43 APARTMENT
310935 2 BHK 47.86 APARTMENT
310935 2 BHK 49.8 APARTMENT
310817 1BHK 28.56 APARTMENT
310817 1BHK 30.9 APARTMENT
310817 1BHK 30.9 APARTMENT
310817 1BHK 31.45 APARTMENT
310803 1BHK 25.92 APARTMENT
310803 1BHK 30.21 APARTMENT


Now i want to remove duplicates from area column but condition is that it should be key based. Meaning 1 key cannot have duplicates Area. Area can be duplicate in other keys but not in itself key.

I am trying to create it but not getting the logic behind:

These are my codes:

import csv
OUTPUT_FILE = 'Desired_format.csv'
filename = "optionsbook.csv"
sublist = []
with open("./"+ filename, "r") as file,open(OUTPUT_FILE, 'w') as f_out:
    reader = csv.DictReader(file)
    for line in reader:
        line["key"] = line["bhk"],line["Area"],line["Property_Type"]
        if line["Area"] in line:
            continue
        else:
            sublist.append(line["key"])
Just memorize (key, area) in a set.

seen = set()
for row in data:
    memorize = (row['key'], row['Area'])
    if memorize in seen:
        continue
    else:
        seen.add(memorize)
        print(row)
del seen
By the way, the provided example data is wrong.
It's not comma separated and if you use whitespace as delimiter,
you'll get 5 columns for room 310935 and 4 columns for the rest.
Between 2 and BHK is a whitespace.
So sorry that's a typo Error. But nice example. I am trying to fit in your example into my code. Thank you so much