Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Removing duplicate Image
#1
So I export a Telegram Channel's All Images and the result is one original image and it's thumbnail as you can see in pic

[Image: pyth.png]

I tried to rename them and delete all odd or even ones but that didn't work well for all Images
Do someone have any other idea?

I don't know if I explained it correctly or not
Reply
#2
Hi,
could you share your code? It may be easier to help you if we could view what you are doing.
But even without code it seems that the thumbnail images are named <name_of_the_image>_thumb. You are probably going to fetch a list of all images in that chat. After that you could filter via list comprehension to remove the thumbnail images.
Reply
#3
(Jan-23-2020, 02:34 PM)ThiefOfTime Wrote: Hi,
could you share your code? It may be easier to help you if we could view what you are doing.
But even without code it seems that the thumbnail images are named <name_of_the_image>_thumb. You are probably going to fetch a list of all images in that chat. After that you could filter via list comprehension to remove the thumbnail images.
What is list comprehension?
Reply
#4
A list comprehension is a neat way to combine for-loops with lists (or many other iterable structures). You can generate a new list out of an old list (even with conditions) by iterating through each element of the old list all in one line. For example:
Lets say we have the list numbers = [1, 2, 3, 4, 5, 7, 9] and we want to only keep the odd numbers. Without list comprehension we would do this:
numbers = [1, 2, 3, 4, 5, 7, 9]
odds = []
for n in numbers:
    if n % 2 == 1:
        odds.append(n)
Which would modify the list odds so that it would look like this: odds = [1, 3, 5, 7, 9]
Now with list comprehensions it would look like this:
numbers = [1, 2, 3, 4, 5, 7, 9]
odds = [n for n in numbers if n % 2 == 1]
As you can see we squashed the complete for-loop together with the initialization of the odds list in one line. Of course depending of the use case the if-cause would be optional, but in this case and your presented problem you would need it. There are even some more tricks that can be done with list comprehensions, but this is all you need for now.
So list comprehensions help you to filter your data using one line of code and in a nice readable way.
Here is a nice tutorial on how to use list comprehensions: https://realpython.com/list-comprehension-python/
Reply
#5
The only problem with a set loop is that as you remove files the count will change. I think I would walk though the directory and read the last 5 characters of the file name string and if it =="thumb" then remove it.

I would have to sit down to write out the code( and the boss might get upset if I did that now Smile ) but I know I've done something similar before.
Reply
#6
It kind of depends on where he is getting the image names from. When I am not mistaken he is requesting them from the telegram API followed by a download. Therefore he is getting a list (or a similar datastructure) right away.
Also the images are not deleted from the original list so this will work fine :) The for-loops also allow elements to be deleted since the for-loop uses the iterator to move over the iterable structure it does not care if already seen items are deleted.
Reply
#7
ThiefOfTime That makes sense, I'll sure you're probably right
Reply
#8
Removing duplicate files is normal to use Hash-based verification.
With Python can eg use hashlib.
So can example do it like this.
import hashlib
import os

def remove_duplicate(path):
    unique = {}
    os.chdir(path)
    for file in os.scandir(path):
        with open(file.name, 'rb') as f:
            filehash = hashlib.md5(f.read()).hexdigest()
            if filehash not in unique:
                unique[filehash] = file.name
            else:
                # Test print before removing
                print(f'Removing --> {unique[filehash]}')
                #os.remove(unique[filehash])

if __name__ == '__main__':
    path = r'C:\div_code\img'
    remove_duplicate(path)
Reply
#9
(Jan-24-2020, 12:28 PM)snippsat Wrote: Removing duplicate files is normal to use Hash-based verification.
With Python can eg use hashlib.
So can example do it like this.
import hashlib
import os

def remove_duplicate(path):
    unique = {}
    os.chdir(path)
    for file in os.scandir(path):
        with open(file.name, 'rb') as f:
            filehash = hashlib.md5(f.read()).hexdigest()
            if filehash not in unique:
                unique[filehash] = file.name
            else:
                # Test print before removing
                print(f'Removing --> {unique[filehash]}')
                #os.remove(unique[filehash])

if __name__ == '__main__':
    path = r'C:\div_code\img'
    remove_duplicate(path)

Thanks Dude but The Pics are not duplicate the thumbnail of the Image is smaller in size and dimension so it has a different hash

(Jan-23-2020, 09:19 PM)benlyboy Wrote: The only problem with a set loop is that as you remove files the count will change. I think I would walk though the directory and read the last 5 characters of the file name string and if it =="thumb" then remove it.

I would have to sit down to write out the code( and the boss might get upset if I did that now Smile ) but I know I've done something similar before.

Thank you so much Dude for your Idea.
Thanks to everyone for responding on my thread.
Finally all thumbnail Images are removed
Here is my code:

import os

dir = input("Enter folder location: ")

os.chdir(dir)

for files in list(os.listdir()):
    if files[-9:-4] == 'thumb':
        os.remove(files)

print("Thumbnails Removed")
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Removing duplicate list items eglaud 4 2,644 Nov-22-2019, 08:07 PM
Last Post: ichabod801
  removing duplicate numbers from a list calonia 12 5,137 Jun-16-2019, 12:09 PM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020