Find closest string pattern - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Find closest string pattern (/thread-14693.html) |
Find closest string pattern - jmair - Dec-12-2018 I need a way to identify the closest matching list based on the string based on the sequence of words. for example. my_list = ['one', 'two', 'three' , 'four'] option_one = ['two', 'three', 'one', 'four'] option_two = ['four', 'three', 'two', 'one'] The closest sequence of the two options to my_list is option_one. The order of the positions are not the same as my_list but they are closer than option_two. edit: to clarify, I'm looking for the closest match reading left to right in sequence. Though it doesn't equal my_list, it closer that option_two. And that's about where my brain stops. I'm not sure of an elegant way to solve this. I'm not looking for a coded solution, but if anyone has an idea on how to go about solving it, I'm all ears. Thanks! RE: Find closest string pattern - micseydel - Dec-12-2018 "closest matching list" seems undefined here. Is this homework, where you have thorough instructions, or is the requirement coming from somewhere else? RE: Find closest string pattern - nilamo - Dec-12-2018 How do you know option_one is closer? You can't put that into code before you can put it into words. RE: Find closest string pattern - jmair - Dec-12-2018 (Dec-12-2018, 07:37 PM)nilamo Wrote: How do you know option_one is closer? You can't put that into code before you can put it into words. Though the string values don't line up exactly with my_list, the sequence is closer reading from left to right than option two. RE: Find closest string pattern - nilamo - Dec-12-2018 Every element of option_two is only one element away from my_list. option_one's "one" is two elements away. So why isn't option_two the closer match? RE: Find closest string pattern - jmair - Dec-12-2018 (Dec-12-2018, 07:09 PM)micseydel Wrote: "closest matching list" seems undefined here. Is this homework, where you have thorough instructions, or is the requirement coming from somewhere else?nah, not homework. I was listening to a lecture about NLP and the difference between identifying words in a string, identifying the sequence of words in a string and intent. It just had me wondering how to find the closest matching sentence based on matching word order from left to right. ie. my_string = "I that the dead beat cat" option_one = "Is the cat dead" option_two = "I beat the dead cat" both contain some/all of the words as my_string. But one is closer to matching my_string on the sequence of words from left to right. RE: Find closest string pattern - Gribouillis - Dec-12-2018 You could use a metric similar to the edit distance, ie the minimal number of operations required to transform one list into another. RE: Find closest string pattern - jmair - Dec-12-2018 (Dec-12-2018, 07:52 PM)nilamo Wrote: Every element of option_two is only one element away from my_list. option_one's "one" is two elements away. So why isn't option_two the closer match? thank you for that perspective. I added an edit to help clarify that. my_list = red, red, red, blue, black option1 = blue, red option2 = red, black option2 shares the closest order as my_list reading left to right. RE: Find closest string pattern - Gribouillis - Dec-12-2018 Using the normalized compression distance, option 1 is better in the first example and option 2 is better in the second example from zlib import compress def ncd(a, b): """Normalized compression distance between two byte strings""" u =len(compress(a)) v =len(compress(b)) u, v =min(u, v), max (u, v) w =len(compress(a + b)) return float (w - u) / v def l2b(alist): return ",".join(alist).encode() def option_distances(alist, *options): b = l2b(alist) return [(ncd(b, l2b(option)), option) for option in options] if __name__ == '__main__': my_list = ['one', 'two', 'three' , 'four'] option_one = ['two', 'three', 'one', 'four'] option_two = ['four', 'three', 'two', 'one'] print(my_list) for d, option in option_distances(my_list, option_one, option_two): print(d, option) my_list = "red red red blue black".split() option1 = "blue red".split() option2 = "red black".split() print(my_list) for d, option in option_distances(my_list, option1, option2): print(d, option)
RE: Find closest string pattern - jmair - Dec-12-2018 (Dec-12-2018, 08:54 PM)Gribouillis Wrote: Using the normalized compression distance, option 1 is better in the first example and option 2 is better in the second example Perfect, thanks. I'll look at that. Appreciate the direction. |