Python Forum
pymongo diff type problem to find images on two drives
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pymongo diff type problem to find images on two drives
#1
I would like to compare two photo collections on two different hard drives so that I can locate any images on drive 2 that are not already on drive 1 and copy them to drive 1. I already have two mongodb collections and each document has a hash value from each image so I can compare by this hash.The hashing was done earlier. My code seems very slow ( maybe 40 minutes to check 40000 documents against 40000 others). So I am wondering how to do this faster. Like one minute? Anyway I just coded what seemed like a simple way to do this and now I would like to know how to see what operation is consuming the most time and also I think I need a whole new approach here. Maybe mongodb can do this super quick on it's own?

z2 = coll2.find({})
            count=0
            for i in z2:
                somehash = i["hashvalue"]  # so get the a hash value from the old hard drive to be compared against the new drive contents
                z1 = coll.find({ "hashvalue": somehash  }) # can we find the hash from drive 2 collection on the drive 1 collection ?
                print ('query count is ', z1.count())

                if z1.count() >0 :   # we did get a match so there is a duplicate
                    bool= True
                else:
                    bool = False   #  the image on drive 2 does not have a match on the new drive 1 (so inspect it later)

                coll2.update_one(
                    {'_id': i['_id']},
                    {
                        '$set': {'isdupe': bool}
                    }
                )
I was wondering if I should have just built an array of hashes from each collection and then just worked with that in python alone (maybe just using sets) and then when I had found the non-duplicate hashes go and find those back in the data. Also I did not sort anything but I thought the mongodb search would not be hampered by that. Maybe I'm wrong about that. Thanks.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  python pandas: diff between 2 dates in a groupby bluedragon 0 3,251 Mar-25-2020, 04:18 PM
Last Post: bluedragon
  Problem with date type (object to datetime) karlito 6 3,531 Oct-16-2019, 08:07 AM
Last Post: karlito

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020