Python Forum
Removing duplicate list items - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Removing duplicate list items (/thread-22607.html)



Removing duplicate list items - eglaud - Nov-19-2019

Hello, I am creating a code that allows the user to paste copied excel cells into a function that will return all the entries with commas, properly capitalized, and without duplicates. My code is below (minus the part that fixes capitalization). I noticed with the way it is, if I type "blacknose dace, blacknose dace" it will still return both. If I paste the excel cells multiple times, it will also fail to remove duplicates. I am a very novice coder and am not sure what to do, any tips?

from collections import OrderedDict #gives us a tool to order the list we create


print ("Type 'I()' to start the historic fish tool. Type 'stop' to end the loop") #instructions
print ("Note: please PASTE the excel data. You may type it, but commas won't be placed.")

def I():   #When we type "I()" the historic fish tool runs, letting us paste excel data to get a non-duplicated,
           #properly capitalized fish species list

    Fish = "Hi"     #just defines the Fish variable so we may start the loop below
    while Fish != "STOP":    #unless you type "stop" in any capitalization, this code will loop, letting the user
                             #put in multiple sites' data throughout the session
        print('')
        Fish = raw_input("Paste excel data: ").upper()     #lets us input our fish species, makes input uppercase
        Fi = Fish.split("\n")   #creates a list based off our fish species
        Fi = Fish.split(",")
        Fi = list (OrderedDict.fromkeys(Fi))   #removes duplicates
        Fi.sort()    #sorts the species alphabetically
        print '[%s]' % ', '.join(map(str, Fi)) #removes quotation marks and prints list



RE: Removing duplicate list items - ichabod801 - Nov-19-2019

'spam, spam'.split() gives you ['spam', ' spam']. That extra space in the second one makes them not duplicates. You need to strip each word. This can be done with a list comprehension: Fi = [word.split() for word in Fish.split()]

Note that line 15 does nothing. First of all, there will be no newline in the text, since raw_input only accepts one line of input. Second of all, you store the result in Fi, but then overwrite that on line 16. Fish doesn't change when you split it, and you couldn't really split Fi, because it's a list at that point, not a string.

You don't need OrderedDict. You sort the result after making a list of it, so you lose any order stored in OrderedDict. So you can just use set to remove the duplicates.

Finally, you are using version 2.7. You need to upgrade to the latest version. End of life for 2.7 is in 42 days.


RE: Removing duplicate list items - eglaud - Nov-19-2019

For example,
Paste excel data: blacknose dace
Blacknose dace
returns
[Blacknose dace
Blacknose dace]
and
Paste excel data: blacknose dace, Blacknose dace, blacknose dace blacknose dace
Blacknose dace
returns
[ Blacknose dace,  Blacknose dace Blacknose dace
Blacknose dace, Blacknose dace]
I would like all of these to just return "Blacknose dace"

Thanks Craig! I read all of this and realize that my problems would be moot if I just follow the code how I wrote it, and JUST paste one section of excel data. I have a few questions, however.
(Nov-19-2019, 06:59 PM)ichabod801 Wrote: You don't need OrderedDict. You sort the result after making a list of it, so you lose any order stored in OrderedDict. So you can just use set to remove the duplicates.

Finally, you are using version 2.7. You need to upgrade to the latest version. End of life for 2.7 is in 42 days.
The ordereddict isn't the best option, but it does seemingly remove my duplicates (when this is used properly), but I'm unsure how to use set? I tried that but it splits all my words into letters? I get the desire to code efficiently, but it does seem to work.

As for the 2.7, I have no idea how you could tell, but will my code not work in the latest version? I'm using my work's IDLE program.


RE: Removing duplicate list items - eglaud - Nov-22-2019

(Nov-19-2019, 06:59 PM)ichabod801 Wrote: You don't need OrderedDict. You sort the result after making a list of it, so you lose any order stored in OrderedDict. So you can just use set to remove the duplicates.

Finally, you are using version 2.7. You need to upgrade to the latest version. End of life for 2.7 is in 42 days.

Hey Craig, sorry if you already got this notification, but I wasn't sure if I posted a new reply or edited another reply of mine? But would you mind seeing what I wrote about the ordereddict and 2.7in the quote portion of my other reply. Thanks so much.


RE: Removing duplicate list items - ichabod801 - Nov-22-2019

If you give set the full string, it will break it into characters. But if you give it a list of strings, it won't break the individual strings apart:

Fi = Fish.split(',')
Fi = list(set(Fi))
Fi.sort()
I can tell you are using 2.7 because you use raw_input. In 3.0+, input is what raw_input was in 2.7, and raw_input doesn't exist. That code would give you a NameError in the latest version. If that is what you are using at work, you should point out to them that 2.7 will receive no security updates after the end of the year, and they should upgrade to the newest version. But note that 3.0 broke backward compatibility, so you may need to upgrade your code first.