Python Forum

Full Version: Mann Whitney U-test on several data sets
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I'm really struggling to find a way to do the following:

Suppose I have two groups of data sets (fictitious in this example):

group_a = [1, 5, 7, 3, 5, 8, 34]
group_b = [1, 2, 4, 3, 5, 8, 45]
group_c = [1, 5, 7, 3, 5, 8, 35]

group_1 = [1, 2, 7, 3, 5, 8, 56]
group_2 = [1, 5, 7, 3, 5, 8, 23]
group_3 = [1, 4, 6, 3, 5, 8, 25]
group_4 = [1, 5, 7, 8, 5, 8, 45]
group_5 = [1, 3, 7, 3, 5, 8, 15]
group_6 = [1, 5, 7, 3, 5, 8, 16]

and I need to perform a Mann Whitney U-test on all possible combinations of the letter and number groups, that is; I want a result for all the following combinations:

(group_a, group_1)
(group_a, group_2)
(group_a, group_3)
(group_a, group_4)
(group_a, group_5)
(group_a, group_6)
(group_b, group_1)
(group_b, group_2)
(group_b, group_3)
(group_b, group_4)
(group_b, group_5)
(group_b, group_6)
(group_c, group_1)
(group_c, group_2)
(group_c, group_3)
(group_c, group_4)
(group_c, group_5)
(group_c, group_6)

(But in reality there are more letter groups and many more number groups).

Is there an efficient way to do this?

Unfortunately I'm quite new to Python and am self taught.

Any advice regarding this would be really appreciated.

Additionally; a lot of my work requires doing comparisons like this, so any suggestions of books, courses, anything at all that would help me with this would also be amazing.

I currently work as a Data Analyst looking to transition into Statistics, (hence I'm trying to perform my regular work to a higher level and trying to use Python as much as I can going forward).

Thanks.
You want to have better data structures and product does what the name says.
It makes the Cartesian product of iterables: https://docs.python.org/3/library/iterto...ls.product

from itertools import product


groups_first = [
	[1, 5, 7, 3, 5, 8, 34],
	[1, 2, 4, 3, 5, 8, 45],
	[1, 5, 7, 3, 5, 8, 35],
]

groups_second = [
	[1, 2, 7, 3, 5, 8, 56],
	[1, 5, 7, 3, 5, 8, 23],
	[1, 4, 6, 3, 5, 8, 25],
	[1, 5, 7, 8, 5, 8, 45],
	[1, 3, 7, 3, 5, 8, 15],
	[1, 5, 7, 3, 5, 8, 16],
]


print("Without indicies")
for first_group, second_group in product(groups_first, groups_second):
	print(first_group, second_group)

print()
print("With indicies")
# to get for groups_first and groups_second you can use enumerate
iterator = product(enumerate(groups_first), enumerate(groups_second))
# in addition you can use tuple unpacking
for (first_idx, first_group), (second_idx, second_group) in iterator:
	print(first_idx, first_group, second_idx, second_group)

# this won't work, because the first_idx and first group is a tuple
# same for the second_idx and second_group
#for first_idx, first_group, second_idx, second_group in iterator:
#	print(first_idx, first_group, second_idx, second_group)
Quote:Without indicies
[1, 5, 7, 3, 5, 8, 34] [1, 2, 7, 3, 5, 8, 56]
[1, 5, 7, 3, 5, 8, 34] [1, 5, 7, 3, 5, 8, 23]
[1, 5, 7, 3, 5, 8, 34] [1, 4, 6, 3, 5, 8, 25]
[1, 5, 7, 3, 5, 8, 34] [1, 5, 7, 8, 5, 8, 45]
[1, 5, 7, 3, 5, 8, 34] [1, 3, 7, 3, 5, 8, 15]
[1, 5, 7, 3, 5, 8, 34] [1, 5, 7, 3, 5, 8, 16]
[1, 2, 4, 3, 5, 8, 45] [1, 2, 7, 3, 5, 8, 56]
[1, 2, 4, 3, 5, 8, 45] [1, 5, 7, 3, 5, 8, 23]
[1, 2, 4, 3, 5, 8, 45] [1, 4, 6, 3, 5, 8, 25]
[1, 2, 4, 3, 5, 8, 45] [1, 5, 7, 8, 5, 8, 45]
[1, 2, 4, 3, 5, 8, 45] [1, 3, 7, 3, 5, 8, 15]
[1, 2, 4, 3, 5, 8, 45] [1, 5, 7, 3, 5, 8, 16]
[1, 5, 7, 3, 5, 8, 35] [1, 2, 7, 3, 5, 8, 56]
[1, 5, 7, 3, 5, 8, 35] [1, 5, 7, 3, 5, 8, 23]
[1, 5, 7, 3, 5, 8, 35] [1, 4, 6, 3, 5, 8, 25]
[1, 5, 7, 3, 5, 8, 35] [1, 5, 7, 8, 5, 8, 45]
[1, 5, 7, 3, 5, 8, 35] [1, 3, 7, 3, 5, 8, 15]
[1, 5, 7, 3, 5, 8, 35] [1, 5, 7, 3, 5, 8, 16]

With indicies
0 [1, 5, 7, 3, 5, 8, 34] 0 [1, 2, 7, 3, 5, 8, 56]
0 [1, 5, 7, 3, 5, 8, 34] 1 [1, 5, 7, 3, 5, 8, 23]
0 [1, 5, 7, 3, 5, 8, 34] 2 [1, 4, 6, 3, 5, 8, 25]
0 [1, 5, 7, 3, 5, 8, 34] 3 [1, 5, 7, 8, 5, 8, 45]
0 [1, 5, 7, 3, 5, 8, 34] 4 [1, 3, 7, 3, 5, 8, 15]
0 [1, 5, 7, 3, 5, 8, 34] 5 [1, 5, 7, 3, 5, 8, 16]
1 [1, 2, 4, 3, 5, 8, 45] 0 [1, 2, 7, 3, 5, 8, 56]
1 [1, 2, 4, 3, 5, 8, 45] 1 [1, 5, 7, 3, 5, 8, 23]
1 [1, 2, 4, 3, 5, 8, 45] 2 [1, 4, 6, 3, 5, 8, 25]
1 [1, 2, 4, 3, 5, 8, 45] 3 [1, 5, 7, 8, 5, 8, 45]
1 [1, 2, 4, 3, 5, 8, 45] 4 [1, 3, 7, 3, 5, 8, 15]
1 [1, 2, 4, 3, 5, 8, 45] 5 [1, 5, 7, 3, 5, 8, 16]
2 [1, 5, 7, 3, 5, 8, 35] 0 [1, 2, 7, 3, 5, 8, 56]
2 [1, 5, 7, 3, 5, 8, 35] 1 [1, 5, 7, 3, 5, 8, 23]
2 [1, 5, 7, 3, 5, 8, 35] 2 [1, 4, 6, 3, 5, 8, 25]
2 [1, 5, 7, 3, 5, 8, 35] 3 [1, 5, 7, 8, 5, 8, 45]
2 [1, 5, 7, 3, 5, 8, 35] 4 [1, 3, 7, 3, 5, 8, 15]
2 [1, 5, 7, 3, 5, 8, 35] 5 [1, 5, 7, 3, 5, 8, 16]
Hi,

many thanks for this, it may be what I need to get through this.

Thanks again.