I'm porting some R scripts to Python. More specifically, Levene, Shapiro-Wilk, ANOVA, Kruskal-Wallis, Tukey HSD and Wilcoxon Rank Sum Test.

Most is done with scipy module, so it's all good. However, there is a alphabetical labeling feature of multCompLetters which I find hard to reproduce in Python. Its purpose is to group not significantly differents treatments (p>0.05) according to the p_value obtained following pairwise comparisons (here with Tukey and Wilcoxon tests).

I'm trying to cover two edge cases, one where one sample group is 'in between' two others, and another where three groups are not significantly differents and should receive the same 'rank' label.

Here is the minimal example code:

As you can see, the first edge case (here with Wilcoxon data) works as expected. For the second however, the ranking fails to recognize that samples groups A, B and C are the same.

Hopefully my explanation is clear enough.

Any hints would be highly appreciated!

Most is done with scipy module, so it's all good. However, there is a alphabetical labeling feature of multCompLetters which I find hard to reproduce in Python. Its purpose is to group not significantly differents treatments (p>0.05) according to the p_value obtained following pairwise comparisons (here with Tukey and Wilcoxon tests).

I'm trying to cover two edge cases, one where one sample group is 'in between' two others, and another where three groups are not significantly differents and should receive the same 'rank' label.

Here is the minimal example code:

from pprint import pprint def match_groups(means: dict, p_values: dict, descending=True): def alphabetize(num_rank: int) -> str: return chr(97 + num_rank) # Sort groups by their means sorted_means = dict(sorted(means.items(), key=lambda x: x[1], reverse=descending)) result = {group: {"mean": mean, "rank": set()} for group, mean in sorted_means.items()} # Assign a decreasing rank to every matching pairs rank = 0 for group in result: significant_diff = False for pair, properties in p_values.items(): p_value = properties["p_value"] a, b = pair.split(" vs ") if group == a and p_value > 0.05: significant_diff = True # print(a, "vs", b, "> 0.05. rank=", rank) result[a]["rank"].add(alphabetize(rank)) result[b]["rank"].add(alphabetize(rank)) if significant_diff: rank += 1 # Assign a descending rank for non-matching pairs for group, properties in result.items(): if not properties["rank"]: properties["rank"].add(alphabetize(rank)) rank += 1 return result means_dict = { "A": 63, "D": 45, "B": 61, "C": 58, } p_dict_wilcox = { ("A vs B"): {"p_value": 0.1171}, ("A vs D"): {"p_value": 0.0090}, ("A vs C"): {"p_value": 0.0282}, ("B vs D"): {"p_value": 0.0090}, ("B vs C"): {"p_value": 0.0758}, ("D vs C"): {"p_value": 0.0162}, } p_dict_tukey = { ("A vs B"): {"p_value": 0.8336}, ("A vs D"): {"p_value": 0.0000}, ("A vs C"): {"p_value": 0.2469}, ("B vs D"): {"p_value": 0.0000}, ("B vs C"): {"p_value": 0.6896}, ("D vs C"): {"p_value": 0.0004}, } print("Wilcoxon:") groups = match_groups(means_dict, p_dict_wilcox) pprint(groups) print("Expected: a, ab, b, c") print("\nTukey:") groups = match_groups(means_dict, p_dict_tukey) pprint(groups) print("Expected: a, a, a, b")And the output:

Quote:Wilcoxon:

{'A': {'mean': 63, 'rank': {'a'}},

'B': {'mean': 61, 'rank': {'a', 'b'}},

'C': {'mean': 58, 'rank': {'b'}},

'D': {'mean': 45, 'rank': {'c'}}}

Expected: a, ab, b, c

Tukey:

{'A': {'mean': 63, 'rank': {'a'}},

'B': {'mean': 61, 'rank': {'a', 'b'}},

'C': {'mean': 58, 'rank': {'a', 'b'}},

'D': {'mean': 45, 'rank': {'c'}}}

Expected: a, a, a, b

As you can see, the first edge case (here with Wilcoxon data) works as expected. For the second however, the ranking fails to recognize that samples groups A, B and C are the same.

Hopefully my explanation is clear enough.

Any hints would be highly appreciated!