Dec-12-2018, 09:17 PM
(Dec-12-2018, 08:54 PM)Gribouillis Wrote: Using the normalized compression distance, option 1 is better in the first example and option 2 is better in the second example
from zlib import compress def ncd(a, b): """Normalized compression distance between two byte strings""" u =len(compress(a)) v =len(compress(b)) u, v =min(u, v), max (u, v) w =len(compress(a + b)) return float (w - u) / v def l2b(alist): return ",".join(alist).encode() def option_distances(alist, *options): b = l2b(alist) return [(ncd(b, l2b(option)), option) for option in options] if __name__ == '__main__': my_list = ['one', 'two', 'three' , 'four'] option_one = ['two', 'three', 'one', 'four'] option_two = ['four', 'three', 'two', 'one'] print(my_list) for d, option in option_distances(my_list, option_one, option_two): print(d, option) my_list = "red red red blue black".split() option1 = "blue red".split() option2 = "red black".split() print(my_list) for d, option in option_distances(my_list, option1, option2): print(d, option)
Output:['one', 'two', 'three', 'four'] 0.19230769230769232 ['two', 'three', 'one', 'four'] 0.3076923076923077 ['four', 'three', 'two', 'one'] ['red', 'red', 'red', 'blue', 'black'] 0.43478260869565216 ['blue', 'red'] 0.391304347826087 ['red', 'black']
Perfect, thanks. I'll look at that. Appreciate the direction.