Regarding how to randomizing list with having equal probability

radraw · Nov-04-2022, 05:44 AM

Hi I have doubt regarding randomizing images with equal probability.

I have 24 images and I want to generate 150 sets of 9 images from the 24 images without repetition in the same set.

I have two scripts as followed but I am not getting desired results.
Please check and help me with this. Angel

'''
This script is used to generate condition files for 150 participants, where each participants is presented with 9 images
randomly drawn from a set of 24 images. Each of the images are eventually presented 50 times.
'''

# import the libraries needed
import pandas as pd
from numpy.random import shuffle
from psychopy import core

# list of images - currently with generic naming convention - researcher would need to change the names on line 14
# researcher needs to change it to the appropriate file names (i.e. file name + extension)
# if images are in a subfolder, the path must also be included/specified e.g. stimuli/image1.jpg where stimuli is the name of the subfolder and image1.jpg is within it
images = ['im1', 'im2', 'im3', 'im4', 'im5', 'im6', 'im7', 'im8', 'im9', 'im10', 'im11', 'im12', 'im13', 'im14', 'im15', 'im16', 'im17', 'im18','im19','im20','im21','im22','im23','im24']

# parameters 
ims_per_participant = 9 # since there are 24 images to use in total and each participant sees 9 randomly from this set
n_participants = 150 # and it is projected that there will be 150 participants, with each image presented at least 50 times

# repeat the splitting and randomising process 50 times
for i in range(75): 
    shuffle(images) # randomise the images
    p1 = {'images' : images[0:12]} # select the first 9 images
    p2 =  {'images' : images[13:24]} # select the last 9 images
    ps = [p1, p2] # the resulting separated images
    for pi, p in enumerate(ps): # save the separated images by numbering them
        p = pd.DataFrame.from_dict(p)
        p.to_excel('condition_file_'+str(i+1)+'_part'+str(pi+1)+'.xlsx', index = False)

#script2

import random
#list of images
images = ['im1', 'im2', 'im3', 'im4', 'im5', 'im6', 'im7', 'im8', 'im9', 'im10', 'im11', 'im12', 'im13', 'im14', 'im15', 'im16', 'im17', 'im18','im19','im20','im21','im22','im23','im24']
#using the sample() method
for i in range(150):
    sample_images = random.sample (images,k = 9)
    #displaying random selections without
    #repetitions
    print(sample_images)

**deanhystad** · (This post was last modified: Nov-04-2022, 06:09 PM by deanhystad.)

Your math is off.

50*24 = 1200
9 * 150 = 1350

Should it be 8 images?

If it is 8 images you could randomize the 24 images and split them into 3 groups of 8. Repeat 50 times. Very similar to what you were doing in the first example. Random sample will give you an equal distribution if you did this for 150,000,000 groups. At the small sample size of 150 groups it looks a lot less equal.

woooee · Nov-04-2022, 04:59 PM

Use random.shuffle and process sequentially to avoid repetition.

bowlofred · Nov-04-2022, 06:11 PM

(Nov-04-2022, 05:44 AM)radraw Wrote: but I am not getting desired results.

How so? What's wrong with script2? Looks good to me.

**deanhystad** · (This post was last modified: Nov-04-2022, 06:31 PM by deanhystad.)

Quote:Each of the images are eventually presented 50 times.

Unfortunately that is impossible given the constraints. 9 images per set * 150 sets does not equal 24 images * 50 times. The math does work out for 8 images per set.

Either the set size has to be 8 or the requirement is "Each of the images are eventually presented at least 50 times." If the set size is 9, the problem is tricky. You cannot randomly shuffle and process sequentially because the number of images is not evenly divisible by the set size.

**Gribouillis** · Nov-05-2022, 06:20 AM

Script 2 looks good to me. Each image will appear on average 56.25 times with a standard deviation of 5.93 times.

Pedroski55 · Nov-05-2022, 11:14 AM

import itertools

images = ['im1', 'im2', 'im3', 'im4', 'im5', 'im6', 'im7', 'im8',
          'im9', 'im10', 'im11', 'im12', 'im13', 'im14', 'im15',
          'im16', 'im17', 'im18','im19','im20','im21','im22','im23','im24']

com_set = itertools.combinations(images, 9)
mylist = list(com_set)
len(mylist) # 1307504

# just pick randomly from mylist, you have 1 307 504 choices!

**deanhystad** · (This post was last modified: Nov-05-2022, 12:52 PM by deanhystad.)

Problem with using combinations is the order is not random. After seeing as few as 3-4 sets, even if they are randomly selected, you'll begin to predict which image might come next.

from itertools import combinations

for combination in combinations((1, 2, 3, 4), 3):
    print(combination)

(1, 2, 3)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)

**deanhystad** · Nov-05-2022, 01:15 PM

(Nov-05-2022, 06:20 AM)Gribouillis Wrote: Script 2 looks good to me. Each image will appear on average 56.25 times with a standard deviation of 5.93 times.

150 is a small sample size. Even a nice flat random distribution will result in some sets where some images appear much more often than others. Just repeating the exercise 100 times I see a factor of x2 difference between the least and most common image.

I think sample would work fine for my purposes, but it does not deliver "Each of the images are eventually presented 50 times."

from itertools import chain
from collections import Counter
from random import sample

images = list(range(1, 25))

max_ = 0
min_ = 150
diff = 0
for _ in range(100):
    sets = [sample(images, 9) for _ in range(150)]
    counts = Counter(chain(*sets)).most_common()
    max_ = max(max_, counts[0][1])
    min_ = min(min_, counts[-1][1])
    diff = max(diff, (counts[0][1] - counts[-1][1]))

print(max_, min_, diff)

Output:
78 37 35

**Gribouillis** · (This post was last modified: Nov-05-2022, 02:39 PM by Gribouillis.)

(Nov-05-2022, 01:15 PM)deanhystad Wrote: Just repeating the exercise 100 times I see a factor of x2 difference between the least and most common image.

Your tests are biased because you accumulate the mins and the max. Such a difference on a single draw of 150 sets is extremely unlikely. The number of occurrences of an image in the set follows approximately a binomial law, which can be approximated by the normal law, which satisfies the 68-95-99.7% rule. It means that approximately 0.3% of the counts will differ from the average 56 by more than 3 times the standard deviation, which means 3 * 6 = 18. So 99.7% of the counts will be between 56-18 and 56+18.

Of course, if you draw 900 sets of 150 images as you do, a few counts will be extremal.

More constructively, what do you suggest to satisfy the "at least 50 occurrences" requirement?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	List with equal repeatable data	Ataman	6	1,536	Jun-13-2024, 07:30 PM Last Post: Ataman
	Check if two matrix are equal and of not add the matrix to the list	quest	3	1,791	Jul-10-2023, 02:41 AM Last Post: deanhystad
	detect equal sequences in list	flash77	17	5,470	Oct-28-2022, 06:38 AM Last Post: flash77
	is there equal syntax to "dir /s /b"	kucingkembar	2	1,848	Aug-16-2022, 08:26 AM Last Post: kucingkembar
	How to draw the Probability Density Function (PDF) plot regardless of sampe size?	amylopectin	3	8,050	Mar-02-2022, 09:34 PM Last Post: Larz60+
	Can a variable equal 2 things?	Extra	4	2,343	Jan-18-2022, 09:21 PM Last Post: Extra
	finding probability of exceding certain threshold	Staph	1	2,498	Dec-14-2019, 04:58 AM Last Post: Larz60+
	Not equal a dictionary key value	bazcurtis	2	2,565	Dec-11-2019, 11:15 PM Last Post: bazcurtis
	Randomizing Color Output	ammorgan	0	2,416	Dec-27-2018, 03:31 AM Last Post: ammorgan
	Realized variance and daily probability distribution	petergarylee	1	2,992	Jul-06-2018, 02:21 PM Last Post: buran

Regarding how to randomizing list with having equal probability

User Panel Messages

Announcements