Python Forum

Full Version: Removal of duplicates
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi everyone,

I have made an anagram below from a words.txt file.

with open('words.txt', 'r') as read:
    line = read.readlines()

def make_anagram_dict(line):
    word_list = {}

    for word in line:
        word = word.lower()
        key = ''.join(sorted(word))
        if key in word_list and len(word) > 5 and word not in word_list:
            word_list[key].append(word)
        else:
            word_list[key] = [word]

    return word_list

if __name__ == '__main__':
    word_list = make_anagram_dict(line)

    for key, words in word_list.items():
        if len(words) >:
            print('Key value' + ' '*len(key) + '|     words')
            print(key + ' '*len(key) + ':' + str(words))
            print('---------------------------------------------')
The output I get looks like this (on a random part)

Output:
Key value | words hortwy :['worthy\n', 'wrothy\n'] ---------------------------------------------
the problem is that in the words.txt file, It coins duplicates except for the capital letter at the start:
i.e Zipper and zipper. It therefore creates an anagram of zipper, when it shouldn't. I tried to fix it with the part in bold. I would really appreciate any help!
One way to eliminate duplicates would be to convert to a set, then convert back again.
1. Read the words into a list.
2. Convert the items to lower case.
3. Copy the list to a set
4. Copy the set back to a list