Python Forum
Mann Whitney U-test on several data sets
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Mann Whitney U-test on several data sets
#1
Hi,

I'm really struggling to find a way to do the following:

Suppose I have two groups of data sets (fictitious in this example):

group_a = [1, 5, 7, 3, 5, 8, 34]
group_b = [1, 2, 4, 3, 5, 8, 45]
group_c = [1, 5, 7, 3, 5, 8, 35]

group_1 = [1, 2, 7, 3, 5, 8, 56]
group_2 = [1, 5, 7, 3, 5, 8, 23]
group_3 = [1, 4, 6, 3, 5, 8, 25]
group_4 = [1, 5, 7, 8, 5, 8, 45]
group_5 = [1, 3, 7, 3, 5, 8, 15]
group_6 = [1, 5, 7, 3, 5, 8, 16]

and I need to perform a Mann Whitney U-test on all possible combinations of the letter and number groups, that is; I want a result for all the following combinations:

(group_a, group_1)
(group_a, group_2)
(group_a, group_3)
(group_a, group_4)
(group_a, group_5)
(group_a, group_6)
(group_b, group_1)
(group_b, group_2)
(group_b, group_3)
(group_b, group_4)
(group_b, group_5)
(group_b, group_6)
(group_c, group_1)
(group_c, group_2)
(group_c, group_3)
(group_c, group_4)
(group_c, group_5)
(group_c, group_6)

(But in reality there are more letter groups and many more number groups).

Is there an efficient way to do this?

Unfortunately I'm quite new to Python and am self taught.

Any advice regarding this would be really appreciated.

Additionally; a lot of my work requires doing comparisons like this, so any suggestions of books, courses, anything at all that would help me with this would also be amazing.

I currently work as a Data Analyst looking to transition into Statistics, (hence I'm trying to perform my regular work to a higher level and trying to use Python as much as I can going forward).

Thanks.
Reply
#2
You want to have better data structures and product does what the name says.
It makes the Cartesian product of iterables: https://docs.python.org/3/library/iterto...ls.product

from itertools import product


groups_first = [
	[1, 5, 7, 3, 5, 8, 34],
	[1, 2, 4, 3, 5, 8, 45],
	[1, 5, 7, 3, 5, 8, 35],
]

groups_second = [
	[1, 2, 7, 3, 5, 8, 56],
	[1, 5, 7, 3, 5, 8, 23],
	[1, 4, 6, 3, 5, 8, 25],
	[1, 5, 7, 8, 5, 8, 45],
	[1, 3, 7, 3, 5, 8, 15],
	[1, 5, 7, 3, 5, 8, 16],
]


print("Without indicies")
for first_group, second_group in product(groups_first, groups_second):
	print(first_group, second_group)

print()
print("With indicies")
# to get for groups_first and groups_second you can use enumerate
iterator = product(enumerate(groups_first), enumerate(groups_second))
# in addition you can use tuple unpacking
for (first_idx, first_group), (second_idx, second_group) in iterator:
	print(first_idx, first_group, second_idx, second_group)

# this won't work, because the first_idx and first group is a tuple
# same for the second_idx and second_group
#for first_idx, first_group, second_idx, second_group in iterator:
#	print(first_idx, first_group, second_idx, second_group)
Quote:Without indicies
[1, 5, 7, 3, 5, 8, 34] [1, 2, 7, 3, 5, 8, 56]
[1, 5, 7, 3, 5, 8, 34] [1, 5, 7, 3, 5, 8, 23]
[1, 5, 7, 3, 5, 8, 34] [1, 4, 6, 3, 5, 8, 25]
[1, 5, 7, 3, 5, 8, 34] [1, 5, 7, 8, 5, 8, 45]
[1, 5, 7, 3, 5, 8, 34] [1, 3, 7, 3, 5, 8, 15]
[1, 5, 7, 3, 5, 8, 34] [1, 5, 7, 3, 5, 8, 16]
[1, 2, 4, 3, 5, 8, 45] [1, 2, 7, 3, 5, 8, 56]
[1, 2, 4, 3, 5, 8, 45] [1, 5, 7, 3, 5, 8, 23]
[1, 2, 4, 3, 5, 8, 45] [1, 4, 6, 3, 5, 8, 25]
[1, 2, 4, 3, 5, 8, 45] [1, 5, 7, 8, 5, 8, 45]
[1, 2, 4, 3, 5, 8, 45] [1, 3, 7, 3, 5, 8, 15]
[1, 2, 4, 3, 5, 8, 45] [1, 5, 7, 3, 5, 8, 16]
[1, 5, 7, 3, 5, 8, 35] [1, 2, 7, 3, 5, 8, 56]
[1, 5, 7, 3, 5, 8, 35] [1, 5, 7, 3, 5, 8, 23]
[1, 5, 7, 3, 5, 8, 35] [1, 4, 6, 3, 5, 8, 25]
[1, 5, 7, 3, 5, 8, 35] [1, 5, 7, 8, 5, 8, 45]
[1, 5, 7, 3, 5, 8, 35] [1, 3, 7, 3, 5, 8, 15]
[1, 5, 7, 3, 5, 8, 35] [1, 5, 7, 3, 5, 8, 16]

With indicies
0 [1, 5, 7, 3, 5, 8, 34] 0 [1, 2, 7, 3, 5, 8, 56]
0 [1, 5, 7, 3, 5, 8, 34] 1 [1, 5, 7, 3, 5, 8, 23]
0 [1, 5, 7, 3, 5, 8, 34] 2 [1, 4, 6, 3, 5, 8, 25]
0 [1, 5, 7, 3, 5, 8, 34] 3 [1, 5, 7, 8, 5, 8, 45]
0 [1, 5, 7, 3, 5, 8, 34] 4 [1, 3, 7, 3, 5, 8, 15]
0 [1, 5, 7, 3, 5, 8, 34] 5 [1, 5, 7, 3, 5, 8, 16]
1 [1, 2, 4, 3, 5, 8, 45] 0 [1, 2, 7, 3, 5, 8, 56]
1 [1, 2, 4, 3, 5, 8, 45] 1 [1, 5, 7, 3, 5, 8, 23]
1 [1, 2, 4, 3, 5, 8, 45] 2 [1, 4, 6, 3, 5, 8, 25]
1 [1, 2, 4, 3, 5, 8, 45] 3 [1, 5, 7, 8, 5, 8, 45]
1 [1, 2, 4, 3, 5, 8, 45] 4 [1, 3, 7, 3, 5, 8, 15]
1 [1, 2, 4, 3, 5, 8, 45] 5 [1, 5, 7, 3, 5, 8, 16]
2 [1, 5, 7, 3, 5, 8, 35] 0 [1, 2, 7, 3, 5, 8, 56]
2 [1, 5, 7, 3, 5, 8, 35] 1 [1, 5, 7, 3, 5, 8, 23]
2 [1, 5, 7, 3, 5, 8, 35] 2 [1, 4, 6, 3, 5, 8, 25]
2 [1, 5, 7, 3, 5, 8, 35] 3 [1, 5, 7, 8, 5, 8, 45]
2 [1, 5, 7, 3, 5, 8, 35] 4 [1, 3, 7, 3, 5, 8, 15]
2 [1, 5, 7, 3, 5, 8, 35] 5 [1, 5, 7, 3, 5, 8, 16]
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
Hi,

many thanks for this, it may be what I need to get through this.

Thanks again.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  replace sets of values in an array without using loops paul18fr 7 1,716 Jun-20-2022, 08:15 PM
Last Post: paul18fr
  Data sets comparison Fraetos 0 1,415 Sep-14-2021, 06:45 AM
Last Post: Fraetos
  Generate Test data (.csv) using Pandas Ashley 5 3,052 Jun-15-2020, 02:51 PM
Last Post: jefsummers
  Least-squares fit multiple data sets multiverse22 1 2,254 Jun-06-2020, 01:38 AM
Last Post: Larz60+
  Partitioning when splitting data into train and test-dataset Den0st 0 1,969 Dec-07-2019, 08:31 PM
Last Post: Den0st
  Clustering for imbalanced data sets dervast 0 1,612 Sep-25-2019, 06:34 AM
Last Post: dervast
  Compare 2 Csv data sets, identify record with latest date MJUk 11 6,167 Jan-06-2018, 09:23 PM
Last Post: MJUk
  Match two data sets based on item values klllmmm 7 6,441 Mar-29-2017, 02:33 PM
Last Post: zivoni

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020