Exotic stats problem ; mode, fuzzy clusters, etc - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Exotic stats problem ; mode, fuzzy clusters, etc (/thread-11987.html) |
Exotic stats problem ; mode, fuzzy clusters, etc - amyvaulhausen - Aug-03-2018 EXOTIC STATS PROBLEM ; MODE, FUZZY CLUSTERS, ETC Hi I'm working on a math problem and not sure how to approach it. Trying to enumerate, rank and extract most common numeric ranges from a list, with a twist ; Basic operation example is, I have ten numeric values representing weights in variable unit measure types, i.e. grams, ounces, etc Each value is unique and is a decimal value, for example we have the following set of numbers shown below. In this set, we are interested in the most common magnitudinal range. Here the most common range value is shown by three values ; 295.999, 312.015, 330.111 the complete set is shown ; ......................... 102.35 8000.32 330.111 295.999 77.01 16.999 1099.222 645 890.01 312.015 ......................... What I want to be able to do, is to input a list of ten values similar to the above, and have some way, to simply, easily derive the most common value by range. If I was using values that were more static, for example if all value in the range in the list were all similar such as "310", then I could just use the mode function and it would easily tell me this. However since the values are variable decimal types, I am a bit stumped as to how I would accomplish this. I came across Python Fuzzy Clustering and it looks like this might work possibly in relation to mode but wondering if there is a simpler, easier, faster way to do this? The end goal is I want to be able to do pattern analysis on a list of numbers and return the most common range of highest magnitude. Outputs desired, as an example from the list above would be the thee values printed to screen, text file, or variable ???? better way to do this than ; 1.) Fuzzy Clustering 2.) Mode (most common value) of discrete data. What do I mean by define "most common value by range" Any set of ten values, there will be a few values that reoccur, but with slight variance. So for the above the most common range value would be 300 but with minor variance for each instance. What I need to do is to have a function that can tell what number magnitude occurs the most even though each instance is not fixed. If I had a set of ten numbers and out of ten I had three exact values occur I could just use a mode function, but since here - the magnitudes are comprable but not exact, mode wont work in the standard case, but there must be a way to achieve this? Why am I using "three" common range values? "three" is stated in the original post 'as an example'. Here, Im saying we have a fixed data set of ten decimal numeric values. In the set, a certain quantity of the ten values will approach a similiar threshold of magnitude, the other values with be random. The threshold 'cluster magnitude' could be any range, in a spectral type set. So here I chose three values, only to illustrate the concept I am after. This could change, so out of ten there could be four, five, six values with similiar magnitudes. Another way to view this problem might be like a sensor array, where each of the ten values in the set are picking up a pulse value represented by a decimal number and we want to interpret similiar values across the set as something like pressure, a curve, etc. The point is, while we can use a python mode type function to extract most common numeric values from a list where the values are exactly alike and fixed, here we cannot use mode, because the values are not exactly alike, only approximately alike, such as the examples I gave of ; 295.999, 312.015, 330.111 RE: EXOTIC STATS PROBLEM ; MODE, FUZZY CLUSTERS, ETC - Vysero - Aug-03-2018 I will start by trying to simplify the question so that I can understand (got a C in stats!) what you want to do is find the mode of a data set when the data set does not contain exact replicates but similar ones correct? ie find the mode of: x = [12,11,10.4,12,10.9,10.4,15] which will not work due to: statistics.StatisticsError: no unique mode; found 2 equally common valuesis that correct? RE: EXOTIC STATS PROBLEM ; MODE, FUZZY CLUSTERS, ETC - amyvaulhausen - Aug-03-2018 (Aug-03-2018, 07:50 PM)Vysero Wrote: I will start by trying to simplify the question so that I can understand (got a C in stats!) what you want to do is find the mode of a data set when the data set does not contain exact replicates but similar ones correct? ie find the mode of: Thanks Vysero! Yes, mode would work if the values were exact, but what I essentially need is to find out how to perform this kind of function on approximate numbers that are with a range of say +,- 10 values difference but also allows for decimal values RE: EXOTIC STATS PROBLEM ; MODE, FUZZY CLUSTERS, ETC - Vysero - Aug-03-2018 (Aug-03-2018, 07:53 PM)amyvaulhausen Wrote: what I essentially need If you have a range of decimal values which are acceptable then I would suggest you round them: from statistics import mode new_list = [] x = [12,11,10.4,12,10.3,10.4,15] for y in x: new_list.append(round(y)) print('new_list = ', new_list, 'mode of new_list = ', mode(new_list))Output: new_list = [12, 11, 10, 12, 10, 10, 15] mode of new_list = 10Is that acceptable or must you keep the decimal values? RE: EXOTIC STATS PROBLEM ; MODE, FUZZY CLUSTERS, ETC - amyvaulhausen - Aug-04-2018 (Aug-03-2018, 08:06 PM)Vysero Wrote:(Aug-03-2018, 07:53 PM)amyvaulhausen Wrote: what I essentially need Thank you sir, I honestly appreciate the kind feedback! :) In my situation, I need a mode like function but I do not think mode will work here. I wish this were the case. Ideally, it would be great if decimals could be retained, but it would be ok if we had to lose these. The problem however comes, where I am dealing with randomized data except that often numeric trends of similar magnitudes will occur in a set. It is the values of similar magnitude I want to capture. So for example if out of ten numbers, three of these values are close to, for example, 300 but may range up or down by say ten, then I want to grab these values. So for example ; 299, 308, 303 have similiar magnitudes but because they are not exact, mode wont return them if I understand correctly. I like your suggestion however, wondering, do you think some kind of simple iterative loop that checks and compares each number to all other numbers with an IF style statement checking upper, lower bounds of number value within +,- 10 would work ok, or is there a simpler way? RE: Exotic stats problem ; mode, fuzzy clusters, etc - Vysero - Aug-06-2018 (Aug-04-2018, 12:03 AM)amyvaulhausen Wrote: do you think some kind of simple iterative Oh I see now. Well I think the best way to approach that question would be for you to write the code. Then if you wish you can post it here and everyone can take a gander to see if they can improve the logic in some way or another. I was under the impression that you had your heart set on using the mode() function. |