Python Forum
detect equal sequences in list
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
detect equal sequences in list
#1
Hello,
I'm a beginner in python and I'm trying to get the beginning (picture_nr_hanger_start) and the end (picture_nr_hanger_end) of equal sequences of the list "hash_list_film".
The list "hash_list_film" is filled with numbers (later the numbers will be replaced with hash strings):
hash_list_film = ["12", "11", "11", "11", "17", "22", "22", "22", "22", "23"]

What I'm trying to achieve is to detect equal numbers (film hangers) in hash_list_film.
Therefore later I'm planning to extract the pictures of a film and examine them with a perceptual hash - picture_nr_hanger_start and picture_nr_hanger_end shall be returned.
Hanging position is: 1 to 3 and 5 to 8.

My problem is:
What I'm doing wrong with hash_list_film in line 29?
Could you please tell me what I'm doing wrong within the code?

For any help I will be very thankful!!

Greetings, flash77

Error:
Traceback (most recent call last): File "D:\python-Programmierung\BildNr_to_Zeit\main.py", line 29, in <module> picture_nr_hanger_start, picture_nr_hanger_end = HangerFinder.detect_picture_nr(start_examine, hash_list_film) TypeError: HangerFinder.detect_picture_nr() missing 1 required positional argument: 'hash_list_film'
class HangerFinder:
    def __init__(self, start_examine, hash_list_film, end_of_list, picture_nr_hanger_start, picture_nr_hanger_end):
        self.start_examine = start_examine
        self.hash_list_film = hash_list_film
        self.end_of_list = end_of_list
        self.picture_nr_hanger_start = picture_nr_hanger_start
        self.picture_nr_hanger_end = picture_nr_hanger_end

    def detect_picture_nr(self, start_examine, hash_list_film):
        end_of_list = len(hash_list_film) - 1
        for i in range(start_examine, end_of_list):
            # detects picture_nr,  start hanger
            if i + 1 > end_of_list:
                break
            else:
                if hash_list_film[i + 1] == hash_list_film[i]:
                    picture_nr_hanger_start = i
                    for j in range(i, end_of_list):
                        if hash_list_film[j + 1] != hash_list_film[j]:
                            picture_nr_hanger_end = j
                            start_examine = j + 1
        return picture_nr_hanger_start, picture_nr_hanger_end


hash_list_film = ["12", "11", "11", "11", "17", "22", "22", "22", "22", "23"]
start_examine = 0
print(hash_list_film)

picture_nr_hanger_start, picture_nr_hanger_end = HangerFinder.detect_picture_nr(start_examine, hash_list_film)
Reply
#2
The error is because there is no instance of class HangerFinder so when detect_picture_nr is called the first argument is being consumed as self.
class usage would be to create an instance of the class HangerFinder passing it all the required arguments and then call the method detect_picture_nr for example:
...
hanger_finder = HangerFinder(start_examine, hash_list_film, end_of_list, picture_nr_hanger_start, picture_nr_hanger_end)
picture_nr_hanger_start, picture_nr_hanger_end = hanger_finder .detect_picture_nr(start_examine, hash_list_film)
I didn't look too closely at your class HangerFinder but I think it's questionable as the method detect_picture_nr does not use self in any way and it has the same parameters that are passed to the init. You probably just need a function as it stands.
Reply
#3
Two years posting to this forum disqualifies you as a python beginner. That's about how long I've been using Python.

You don't seem to fully understand classes. It looks like you think they work just like functions.

Normally you don't work directly with classes, you work with objects. To get a HangerFinder object you need to create an instance of class HangerFinder. After creating the object you use the object to call the detect_picture_nr() method. The code should look something like this:
class HangerFinder:
    def __init__(self, hash_list_film):
        """This is called when you create an instance of the class.  Initialize things here."""
        # Would this be a good place to make the hash list?  Pass in a film list and compute the hash list?
        self.hash_list = hash_list_film
        self.start = 0

    def detect_picture_nr(self, start=None, end=None):
        """ This is a method that returns the start and end index of a matching sequence.
        It uses instance variables that were created by the __init__().
        """
        if end is not None:
            end = min(end, len(self.hash_list))
        else:
            end = len(self.hash_list)
        if start is not None:
            self.start = max(0, min(end, start))
        start = self.start

        if start >= end:
            return None

        for i in range(start+1, end):
            if self.hash_list[i] != self.hash_list[i-1]:
                self.start = i
                return start, i-1
        self.start = end
        return start, end-1


# Create an instance
finder = HangerFinder(["12", "11", "11", "11", "17", "22", "22", "22", "22", "23"])

# Use the instance
while (range_ := finder.detect_picture_nr()):
    print(range_)
Output:
(0, 0) (1, 3) (4, 4) (5, 8) (9, 9)
But you really don't need a class for something like this. I would write it as a generator.
def hangers(frame_list, hash_func):
    """Return list of sequential frames that have the same hash value"""
    if frame_list:
        start = 0
        prev = hash_func(frame_list[start])
        for index, film in enumerate(frame_list[1:], start=1):
            next = hash_func(film)
            if next != prev:
                yield frame_list[start:index]
                start = index
                prev = next
        yield frame_list[start:]

# Using your number list to represent a frame list.  Using hash() in place of your hash function
for hanger in hangers(["12", "11", "11", "11", "17", "22", "22", "22", "22", "23"], hash):
    print(hanger)
Output:
['12'] ['11', '11', '11'] ['17'] ['22', '22', '22', '22'] ['23']
The same function without requiring frames to be indexable. This lets you use a lazy iterator instead of having to make a big list containing all the frames.
def hangers(frames, hash_func):
    """Return lists of sequential frames with the same hash value
    frames: An iterable of things to hash.
    hash_func: Function that returns a hash value.
    """
    frame_iter = iter(frames)
    seq = [next(frame_iter)]
    prev_hash = hash_func(seq[0])
    for frame in frame_iter:
        next_hash = hash_func(frame)
        if prev_hash != next_hash:
            yield(seq)
            seq = []
            prev_hash = next_hash
        seq.append(frame)
    yield seq

# Using your number list to represent a frame list.  Using hash() in place of your hash function
for hanger in hangers(iter(["12", "11", "11", "11", "17", "22", "22", "22", "22", "23"]), hash):
    print(hanger)
Reply
#4
Hi Yoriz and deanhystad,

thanks for your detailed help!

But @ deanhystad:

(Oct-14-2022, 07:07 PM)deanhystad Wrote: Two years posting to this forum disqualifies you as a python beginner. That's about how long I've been using Python.

You got the wrong tone...

flash77
Reply
#5
Look at all you've done. GUI development, game development, multi-processing. You may not use python every day, but you have too much experience to still think of yourself as a beginner. You may not think you know a lot, but you've already shown here that you do. I'm sorry that didn't come out right the first time. I meant respect, not disrespect.
Reply
#6
Dear deanhystad,

because I'm not a native english speaker, I'm struggling with the english language.

And because of this, I totally misinterpreted your sentence.

Thank you for clarifying matters...

Good night...

flash77
Reply
#7
Quote:Hanging position is: 1 to 3 and 5 to 8

As I understand it, you want the positions of the beginning and end of sequences of the same number in the list hash_list_film.

I don't understand what you are using hash for, maybe you are hashing a tuple?? But that wouldn't make a difference.

sequences is a list of lists of the start position of a sequence of the same number and the end position + 1

Thus, you could take each sub-list and loop through the numbers that are the same and do whatever you wish.

If hash_list_film ends with a sequence, that makes things awkward for my little function, so just append an X anyway.

What you do with the data in sequences is up to you!

def myApp():
    hash_list_film = ["12", "11", "11", "11", "17", "22", "22", "22", "22", "23", "24", "24", "24", "24" "X"]
    sequences = []
    count = 0
    def getSequence(start):
        tmp = [start]
        for k in range(start + 1, len(hash_list_film)-1):            
            # bale out of k loop at end of sequence
            if hash_list_film[k] != hash_list_film[k+1]:
                tmp.append(k+1)
                sequences.append(tmp)
                return k+1

    while count < len(hash_list_film) - 1:
        print(count)
        if hash_list_film[count] == hash_list_film[count+1]:
            count = getSequence(count)
        elif hash_list_film[count] != hash_list_film[count+1]:
            count +=1
    print(sequences)
Output:
[[1, 4], [5, 9], [10, 13]]
Reply
#8
I love a (simple) challenge. It's possible that I'm missing something,
but the above seems rather complicated to me.
What is wrong with:
film = ["12", "11", "11", "11", "17", "22", "22", "22", "22", "23", "24", "24", "24", "24"]
oldvalue = film[0]
startidx = 0
start_end = []
for idx,value in enumerate (film):
    if value != oldvalue:
        start_end.append([startidx,idx-1])
        startidx = idx
    oldvalue = value      
start_end.append([startidx,idx])        
print(start_end)
Output:
[[0, 0], [1, 3], [4, 4], [5, 8], [9, 9], [10, 13]]
Paul
Pedroski55 likes this post
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#9
Hello,

many thanks to all the hard-working people, who shared their knowledge!!

Reading this, I've got a starting point to work with.

I will notify you if I need assistance...

Thanks a lot...

-------------------------------------------------------------------
Just for explanation (because the question arised what I'm planning to do with the list of hash codes):

Digitizing Super8-Movies for customers is a part of my work. During the recording process the film can hang. These hanging sequences I have to cut out in a video editing software.

It is an effort to watch the digitzed film and check for hangers.

What I'm planning to do is the following:

First, I extract the frames from the digitized film.

Then, I want to examine each frame with perceptual hashing.

To every frame it belongs a hash code.

When the film hangs, the same frame is digitized again and again...

Because of this it comes to an equal sequence of the same hash code.

Knowing the start and the end of a hanger I will calculate the position in film (hh:mm:ss) of the hanger.

I think that perceptual hashing is more suitable to examine the frames, because it isn't too accident-sensitive against small changes in the frame.

I tested an other method and changed only one pixel in the same frame and the changed frame was identified as a different picture.

Because of this I'm thinking that the perceptual hashing is better suitable for detecting the same frame (->hanger).
Reply
#10
Dear community,

I will be very thankfull, if I could get some support with my problem of image hashing.

In line 30 I calculate the hamming distance of 2 frames. This is to decide if they are similiar or not.
In line 37 I'm trying to remove similiar content of the element i of the list start_end.
If i[0] == i[1] or if (i[1] - i[0]) < 10 then the element i shall be removed.
This is to remove hangers, which are shorter than a half second (1 second of the film contains 20 frames, so a half second contains 10 frames).

But the list hangers contains still [0,0], [2,2], [4,4], [7,7]...

What could I do?

Greetings, flash77

import openpyxl
import os
from PIL import Image
import time
import imagehash
import distance


def create_phash():
    frame_hash_list = []
    p = "D:/S8_hanger_finder/neuer_Ansatz/aktueller_Versuch/phash_test/"
    obj = os.scandir(p)
    for entry in obj:
        # load frames
        frame = Image.open(p + str(entry.name))
        # create pHash
        # Compare hashes to determine whether the frames are the same or not
        frame_phash = str(imagehash.phash(frame))
        frame_hash_list.append(frame_phash)
    obj.close()
    return frame_hash_list


def detect_hangers(frame_hash_list):
    old_value = frame_hash_list[0]
    start_idx = 0
    start_end = []
    for idx, value in enumerate(frame_hash_list):
        # get hangers with hamming distance
        hdistance = distance.hamming(list(value), list(old_value))
        if hdistance < 10:
            start_end.append([start_idx, idx - 1])
            start_idx = idx
        old_value = value
    start_end.append([start_idx, idx])
    for i in start_end:
        if i[0] == i[1] or (i[1] - i[0]) < 10:
            # remove non-hangers
            start_end.remove(i)
    number_of_hangers = len(start_end)
    return number_of_hangers, start_end


def convert_frame_nr_in_time(d):
    # S8-Movie (avi-file) is checked of hangers
    #####################################################
    # 1 hour contains 72000 frames
    c1 = 72000
    # 1 minute contains 1200 frames
    c2 = 1200
    # 1 second contains 20 frames
    c3 = 20
    # 1 hsecond (=half second) contains 10 frames
    c4 = 10

    def find_even_frame_nr(a, b, c):
        while True:
            if a % c == 0:
                break
            else:
                a -= 1
                b += 1
        return a, b

    frame_nr_full_hour, rest_1 = find_even_frame_nr(d, 0, c1)
    number_of_hours = frame_nr_full_hour / c1
    ###########################################################
    frame_nr_full_minute, rest_2 = find_even_frame_nr(rest_1, 0, c2)
    number_of_minutes = frame_nr_full_minute / c2
    ###########################################################
    frame_nr_full_second, rest_3 = find_even_frame_nr(rest_2, 0, c3)
    number_of_seconds = frame_nr_full_second / c3
    ###########################################################
    if rest_3 > 10:
        number_of_hseconds = 1
    else:
        number_of_hseconds = 0
    return number_of_hours, number_of_minutes, number_of_seconds, number_of_hseconds


measure_time_start = time.time()
frame_hash_list = create_phash()
number_of_hangers, hangers = detect_hangers(frame_hash_list)
measure_time_end = time.time()

print("frame_hash_list: " + str(frame_hash_list))
print("Zeit: " + str(measure_time_end - measure_time_start))
print("number_of_hangers: " + str(number_of_hangers))
print("hangers: " + str(hangers))

p = "D:/S8_hanger_finder/neuer_Ansatz/aktueller_Versuch/S8-Hanger_Positionen.xlsx"
fileXLSX = openpyxl.load_workbook(p)
sheet = fileXLSX["Blatt"]

r = 5
c = 2
for z in range(r, r + number_of_hangers):
    for s in range(c, c + 9):
        sheet.cell(row=z, column=s).value = None


r = 5
for i in hangers:
    frame_nr_hanger_start = i[0]
    frame_nr_hanger_end = i[1]
    number_of_hours_start, number_of_minutes_start, number_of_seconds_start, number_of_hseconds_start = convert_frame_nr_in_time(frame_nr_hanger_start)
    number_of_hours_end, number_of_minutes_end, number_of_seconds_end, number_of_hseconds_end = convert_frame_nr_in_time(frame_nr_hanger_end)
    sheet.cell(row=r, column=2).value = number_of_hours_start
    sheet.cell(row=r, column=3).value = number_of_minutes_start
    sheet.cell(row=r, column=4).value = number_of_seconds_start
    sheet.cell(row=r, column=5).value = number_of_hseconds_start
    sheet.cell(row=r, column=7).value = number_of_hours_end
    sheet.cell(row=r, column=8).value = number_of_minutes_end
    sheet.cell(row=r, column=9).value = number_of_seconds_end
    sheet.cell(row=r, column=10).value = number_of_hseconds_end
    r += 1
fileXLSX.save(p)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Check if two matrix are equal and of not add the matrix to the list quest 3 841 Jul-10-2023, 02:41 AM
Last Post: deanhystad
  Regarding how to randomizing list with having equal probability radraw 14 2,210 Nov-06-2022, 11:09 PM
Last Post: Pedroski55
  is there equal syntax to "dir /s /b" kucingkembar 2 1,007 Aug-16-2022, 08:26 AM
Last Post: kucingkembar
  Can a variable equal 2 things? Extra 4 1,518 Jan-18-2022, 09:21 PM
Last Post: Extra
  needleman wunsch algorithm for two sequences of different length johnny_sav1992 0 1,710 Jul-27-2020, 05:45 PM
Last Post: johnny_sav1992
  help for escape sequences NewPi 1 2,043 Dec-11-2019, 11:22 PM
Last Post: ichabod801
  Not equal a dictionary key value bazcurtis 2 1,947 Dec-11-2019, 11:15 PM
Last Post: bazcurtis
  copying parts of mutable sequences Skaperen 1 2,240 Dec-02-2019, 10:34 AM
Last Post: Gribouillis
  Convert weekly sequences to date and time. SinPy 0 1,457 Nov-23-2019, 05:20 PM
Last Post: SinPy
  Escape sequences display in python Uchikago 1 2,444 Jun-27-2019, 03:25 PM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020