Python Forum

hi,

i have a csv while has data structure like this

key bhk Area Property_Type
310935 2 BHK 47.32 APARTMENT
310935 2 BHK 47.43 APARTMENT
310935 2 BHK 47.86 APARTMENT
310935 2 BHK 49.8 APARTMENT
310817 1BHK 28.56 APARTMENT
310817 1BHK 30.9 APARTMENT
310817 1BHK 30.9 APARTMENT
310817 1BHK 31.45 APARTMENT
310803 1BHK 25.92 APARTMENT
310803 1BHK 30.21 APARTMENT

Now i want to remove duplicates from area column but condition is that it should be key based. Meaning 1 key cannot have duplicates Area. Area can be duplicate in other keys but not in itself key.

I am trying to create it but not getting the logic behind:

These are my codes:

import csv
OUTPUT_FILE = 'Desired_format.csv'
filename = "optionsbook.csv"
sublist = []
with open("./"+ filename, "r") as file,open(OUTPUT_FILE, 'w') as f_out:
    reader = csv.DictReader(file)
    for line in reader:
        line["key"] = line["bhk"],line["Area"],line["Property_Type"]
        if line["Area"] in line:
            continue
        else:
            sublist.append(line["key"])

Just memorize (key, area) in a set.

seen = set()
for row in data:
    memorize = (row['key'], row['Area'])
    if memorize in seen:
        continue
    else:
        seen.add(memorize)
        print(row)
del seen

By the way, the provided example data is wrong.
It's not comma separated and if you use whitespace as delimiter,
you'll get 5 columns for room 310935 and 4 columns for the rest.
Between 2 and BHK is a whitespace.

So sorry that's a typo Error. But nice example. I am trying to fit in your example into my code. Thank you so much

Prince_Bhatia

DeaD_EyE

Prince_Bhatia