Bottom Page

Thread Rating:
  • 2 Vote(s) - 2 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Find closest string pattern
#1
I need a way to identify the closest matching list based on the string based on the sequence of words.

for example.
my_list = ['one', 'two', 'three' , 'four']

option_one = ['two', 'three', 'one', 'four']
option_two = ['four', 'three', 'two', 'one']

The closest sequence of the two options to my_list is option_one. The order of the positions are not the same as my_list but they are closer than option_two.
edit: to clarify, I'm looking for the closest match reading left to right in sequence. Though it doesn't equal my_list, it closer that option_two.

And that's about where my brain stops. I'm not sure of an elegant way to solve this. I'm not looking for a coded solution, but if anyone has an idea on how to go about solving it, I'm all ears.

Thanks!
Quote
#2
"closest matching list" seems undefined here. Is this homework, where you have thorough instructions, or is the requirement coming from somewhere else?
nilamo and jmair like this post
Feel like you're not getting the answers you want? Checkout the help/rules for things like what to include/not include in a post, how to use code tags, how to ask smart questions, and more.

Pro-tip - there's an inverse correlation between the number of lines of code posted and my enthusiasm for helping with a question :)
Quote
#3
How do you know option_one is closer? You can't put that into code before you can put it into words.
Quote
#4
(Dec-12-2018, 07:37 PM)nilamo Wrote: How do you know option_one is closer? You can't put that into code before you can put it into words.

Though the string values don't line up exactly with my_list, the sequence is closer reading from left to right than option two.
Quote
#5
Every element of option_two is only one element away from my_list. option_one's "one" is two elements away. So why isn't option_two the closer match?
Quote
#6
(Dec-12-2018, 07:09 PM)micseydel Wrote: "closest matching list" seems undefined here. Is this homework, where you have thorough instructions, or is the requirement coming from somewhere else?
nah, not homework. I was listening to a lecture about NLP and the difference between identifying words in a string, identifying the sequence of words in a string and intent. It just had me wondering how to find the closest matching sentence based on matching word order from left to right.
ie.
my_string = "I that the dead beat cat"

option_one = "Is the cat dead"
option_two = "I beat the dead cat"

both contain some/all of the words as my_string. But one is closer to matching my_string on the sequence of words from left to right.
Quote
#7
You could use a metric similar to the edit distance, ie the minimal number of operations required to transform one list into another.
Quote
#8
(Dec-12-2018, 07:52 PM)nilamo Wrote: Every element of option_two is only one element away from my_list. option_one's "one" is two elements away. So why isn't option_two the closer match?

thank you for that perspective. I added an edit to help clarify that.

my_list = red, red, red, blue, black
option1 = blue, red
option2 = red, black

option2 shares the closest order as my_list reading left to right.
Quote
#9
Using the normalized compression distance, option 1 is better in the first example and option 2 is better in the second example
from zlib import compress

def ncd(a, b):
    """Normalized compression distance between two byte strings"""
    u =len(compress(a))
    v =len(compress(b))
    u, v =min(u, v), max (u, v)
    w =len(compress(a + b))
    return float (w - u) / v 

def l2b(alist):
    return ",".join(alist).encode()

def option_distances(alist, *options):
    b = l2b(alist)
    return [(ncd(b, l2b(option)), option)
            for option in options]


if __name__ == '__main__':
    my_list = ['one', 'two', 'three' , 'four']
    option_one = ['two', 'three', 'one', 'four']
    option_two = ['four', 'three', 'two', 'one']
    
    print(my_list)
    for d, option in option_distances(my_list, option_one, option_two):
        print(d, option)
    
    my_list = "red red red blue black".split()
    option1 = "blue red".split()
    option2 = "red black".split()
    
    print(my_list)
    for d, option in option_distances(my_list, option1, option2):
        print(d, option)
Output:
['one', 'two', 'three', 'four'] 0.19230769230769232 ['two', 'three', 'one', 'four'] 0.3076923076923077 ['four', 'three', 'two', 'one'] ['red', 'red', 'red', 'blue', 'black'] 0.43478260869565216 ['blue', 'red'] 0.391304347826087 ['red', 'black']
metulburr and jmair like this post
Quote
#10
(Dec-12-2018, 08:54 PM)Gribouillis Wrote: Using the normalized compression distance, option 1 is better in the first example and option 2 is better in the second example
from zlib import compress

def ncd(a, b):
    """Normalized compression distance between two byte strings"""
    u =len(compress(a))
    v =len(compress(b))
    u, v =min(u, v), max (u, v)
    w =len(compress(a + b))
    return float (w - u) / v 

def l2b(alist):
    return ",".join(alist).encode()

def option_distances(alist, *options):
    b = l2b(alist)
    return [(ncd(b, l2b(option)), option)
            for option in options]


if __name__ == '__main__':
    my_list = ['one', 'two', 'three' , 'four']
    option_one = ['two', 'three', 'one', 'four']
    option_two = ['four', 'three', 'two', 'one']
    
    print(my_list)
    for d, option in option_distances(my_list, option_one, option_two):
        print(d, option)
    
    my_list = "red red red blue black".split()
    option1 = "blue red".split()
    option2 = "red black".split()
    
    print(my_list)
    for d, option in option_distances(my_list, option1, option2):
        print(d, option)
Output:
['one', 'two', 'three', 'four'] 0.19230769230769232 ['two', 'three', 'one', 'four'] 0.3076923076923077 ['four', 'three', 'two', 'one'] ['red', 'red', 'red', 'blue', 'black'] 0.43478260869565216 ['blue', 'red'] 0.391304347826087 ['red', 'black']

Perfect, thanks. I'll look at that. Appreciate the direction.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  finding the closest floating point number in a list Skaperen 17 529 Sep-19-2019, 10:39 PM
Last Post: Skaperen
  How to Find & Count String Patterns Between two Markers in a HTML file ahmedwaqas92 3 218 Aug-19-2019, 10:12 AM
Last Post: ahmedwaqas92
  Find string and add character - newbi PyDK 1 211 May-15-2019, 01:22 PM
Last Post: ichabod801
  How can I find a string in sequence (with Booleans)? go127a 3 301 Apr-23-2019, 01:58 PM
Last Post: ichabod801
  String Method 'find(...)'. ClassicalSoul 3 348 Feb-27-2019, 12:24 PM
Last Post: buran
  Find ? in string MuntyScruntfundle 5 431 Jan-17-2019, 08:11 PM
Last Post: buran
  How to find previous string of a string Prince_Bhatia 1 593 Sep-26-2018, 02:31 PM
Last Post: gruntfutuk
  Pattern Split String sripylearn 6 809 Aug-12-2018, 04:20 AM
Last Post: sripylearn
  How Do I find Index of a character in string? ilcaa72 5 826 May-23-2018, 11:44 PM
Last Post: wavic
  Beginner Q: failure to find string in list Jenny 2 599 Mar-18-2018, 01:07 PM
Last Post: Jenny

Forum Jump:


Users browsing this thread: 1 Guest(s)