Python Forum

I'm porting some R scripts to Python. More specifically, Levene, Shapiro-Wilk, ANOVA, Kruskal-Wallis, Tukey HSD and Wilcoxon Rank Sum Test.

Most is done with scipy module, so it's all good. However, there is a alphabetical labeling feature of multCompLetters which I find hard to reproduce in Python. Its purpose is to group not significantly differents treatments (p>0.05) according to the p_value obtained following pairwise comparisons (here with Tukey and Wilcoxon tests).

I'm trying to cover two edge cases, one where one sample group is 'in between' two others, and another where three groups are not significantly differents and should receive the same 'rank' label.

Here is the minimal example code:

from pprint import pprint


def match_groups(means: dict, p_values: dict, descending=True):
    def alphabetize(num_rank: int) -> str:
        return chr(97 + num_rank)

    # Sort groups by their means
    sorted_means = dict(sorted(means.items(), key=lambda x: x[1], reverse=descending))
    result = {group: {"mean": mean, "rank": set()} for group, mean in sorted_means.items()}

    # Assign a decreasing rank to every matching pairs
    rank = 0
    for group in result:
        significant_diff = False
        for pair, properties in p_values.items():
            p_value = properties["p_value"]
            a, b = pair.split(" vs ")
            if group == a and p_value > 0.05:
                significant_diff = True
                # print(a, "vs", b, "> 0.05. rank=", rank)
                result[a]["rank"].add(alphabetize(rank))
                result[b]["rank"].add(alphabetize(rank))
        if significant_diff:
            rank += 1

    # Assign a descending rank for non-matching pairs
    for group, properties in result.items():
        if not properties["rank"]:
            properties["rank"].add(alphabetize(rank))
            rank += 1

    return result


means_dict = {
    "A": 63,
    "D": 45,
    "B": 61,
    "C": 58,
}

p_dict_wilcox = {
    ("A vs B"): {"p_value": 0.1171},
    ("A vs D"): {"p_value": 0.0090},
    ("A vs C"): {"p_value": 0.0282},
    ("B vs D"): {"p_value": 0.0090},
    ("B vs C"): {"p_value": 0.0758},
    ("D vs C"): {"p_value": 0.0162},
}

p_dict_tukey = {
    ("A vs B"): {"p_value": 0.8336},
    ("A vs D"): {"p_value": 0.0000},
    ("A vs C"): {"p_value": 0.2469},
    ("B vs D"): {"p_value": 0.0000},
    ("B vs C"): {"p_value": 0.6896},
    ("D vs C"): {"p_value": 0.0004},
}

print("Wilcoxon:")
groups = match_groups(means_dict, p_dict_wilcox)
pprint(groups)
print("Expected: a, ab, b, c")

print("\nTukey:")
groups = match_groups(means_dict, p_dict_tukey)
pprint(groups)
print("Expected: a, a, a, b")

And the output:

Quote:Wilcoxon:
{'A': {'mean': 63, 'rank': {'a'}},
'B': {'mean': 61, 'rank': {'a', 'b'}},
'C': {'mean': 58, 'rank': {'b'}},
'D': {'mean': 45, 'rank': {'c'}}}
Expected: a, ab, b, c

Tukey:
{'A': {'mean': 63, 'rank': {'a'}},
'B': {'mean': 61, 'rank': {'a', 'b'}},
'C': {'mean': 58, 'rank': {'a', 'b'}},
'D': {'mean': 45, 'rank': {'c'}}}
Expected: a, a, a, b

As you can see, the first edge case (here with Wilcoxon data) works as expected. For the second however, the ranking fails to recognize that samples groups A, B and C are the same.

Hopefully my explanation is clear enough.
Any hints would be highly appreciated!

The problem is that we don't have a description of the algorithm followed by multCompLetters in R. Obviously this problem is related to graph coloring but I don't think the algorithm is simple enough to be guessed just like that without further information.

Alfalfa

Gribouillis