Python Forum
Regarding how to randomizing list with having equal probability
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regarding how to randomizing list with having equal probability
#1
Hi I have doubt regarding randomizing images with equal probability.

I have 24 images and I want to generate 150 sets of 9 images from the 24 images without repetition in the same set.

I have two scripts as followed but I am not getting desired results.
Please check and help me with this. Angel


'''
This script is used to generate condition files for 150 participants, where each participants is presented with 9 images
randomly drawn from a set of 24 images. Each of the images are eventually presented 50 times.
'''



# import the libraries needed
import pandas as pd
from numpy.random import shuffle
from psychopy import core

# list of images - currently with generic naming convention - researcher would need to change the names on line 14
# researcher needs to change it to the appropriate file names (i.e. file name + extension)
# if images are in a subfolder, the path must also be included/specified e.g. stimuli/image1.jpg where stimuli is the name of the subfolder and image1.jpg is within it
images = ['im1', 'im2', 'im3', 'im4', 'im5', 'im6', 'im7', 'im8', 'im9', 'im10', 'im11', 'im12', 'im13', 'im14', 'im15', 'im16', 'im17', 'im18','im19','im20','im21','im22','im23','im24']

# parameters 
ims_per_participant = 9 # since there are 24 images to use in total and each participant sees 9 randomly from this set
n_participants = 150 # and it is projected that there will be 150 participants, with each image presented at least 50 times

# repeat the splitting and randomising process 50 times
for i in range(75): 
    shuffle(images) # randomise the images
    p1 = {'images' : images[0:12]} # select the first 9 images
    p2 =  {'images' : images[13:24]} # select the last 9 images
    ps = [p1, p2] # the resulting separated images
    for pi, p in enumerate(ps): # save the separated images by numbering them
        p = pd.DataFrame.from_dict(p)
        p.to_excel('condition_file_'+str(i+1)+'_part'+str(pi+1)+'.xlsx', index = False)
#script2
import random
#list of images
images = ['im1', 'im2', 'im3', 'im4', 'im5', 'im6', 'im7', 'im8', 'im9', 'im10', 'im11', 'im12', 'im13', 'im14', 'im15', 'im16', 'im17', 'im18','im19','im20','im21','im22','im23','im24']
#using the sample() method
for i in range(150):
    sample_images = random.sample (images,k = 9)
    #displaying random selections without
    #repetitions
    print(sample_images)
Reply
#2
Your math is off.

50*24 = 1200
9 * 150 = 1350

Should it be 8 images?

If it is 8 images you could randomize the 24 images and split them into 3 groups of 8. Repeat 50 times. Very similar to what you were doing in the first example. Random sample will give you an equal distribution if you did this for 150,000,000 groups. At the small sample size of 150 groups it looks a lot less equal.
Reply
#3
Use random.shuffle and process sequentially to avoid repetition.
Reply
#4
(Nov-04-2022, 05:44 AM)radraw Wrote: but I am not getting desired results.

How so? What's wrong with script2? Looks good to me.
Gribouillis likes this post
Reply
#5
Quote:Each of the images are eventually presented 50 times.

Unfortunately that is impossible given the constraints. 9 images per set * 150 sets does not equal 24 images * 50 times. The math does work out for 8 images per set.

Either the set size has to be 8 or the requirement is "Each of the images are eventually presented at least 50 times." If the set size is 9, the problem is tricky. You cannot randomly shuffle and process sequentially because the number of images is not evenly divisible by the set size.
Reply
#6
Script 2 looks good to me. Each image will appear on average 56.25 times with a standard deviation of 5.93 times.
Reply
#7
import itertools

images = ['im1', 'im2', 'im3', 'im4', 'im5', 'im6', 'im7', 'im8',
          'im9', 'im10', 'im11', 'im12', 'im13', 'im14', 'im15',
          'im16', 'im17', 'im18','im19','im20','im21','im22','im23','im24']

com_set = itertools.combinations(images, 9)
mylist = list(com_set)
len(mylist) # 1307504

# just pick randomly from mylist, you have 1 307 504 choices!
Reply
#8
Problem with using combinations is the order is not random. After seeing as few as 3-4 sets, even if they are randomly selected, you'll begin to predict which image might come next.
from itertools import combinations

for combination in combinations((1, 2, 3, 4), 3):
    print(combination)
(1, 2, 3)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)
Reply
#9
(Nov-05-2022, 06:20 AM)Gribouillis Wrote: Script 2 looks good to me. Each image will appear on average 56.25 times with a standard deviation of 5.93 times.
150 is a small sample size. Even a nice flat random distribution will result in some sets where some images appear much more often than others. Just repeating the exercise 100 times I see a factor of x2 difference between the least and most common image.

I think sample would work fine for my purposes, but it does not deliver "Each of the images are eventually presented 50 times."
from itertools import chain
from collections import Counter
from random import sample

images = list(range(1, 25))

max_ = 0
min_ = 150
diff = 0
for _ in range(100):
    sets = [sample(images, 9) for _ in range(150)]
    counts = Counter(chain(*sets)).most_common()
    max_ = max(max_, counts[0][1])
    min_ = min(min_, counts[-1][1])
    diff = max(diff, (counts[0][1] - counts[-1][1]))

print(max_, min_, diff)
Output:
78 37 35
Reply
#10
(Nov-05-2022, 01:15 PM)deanhystad Wrote: Just repeating the exercise 100 times I see a factor of x2 difference between the least and most common image.
Your tests are biased because you accumulate the mins and the max. Such a difference on a single draw of 150 sets is extremely unlikely. The number of occurrences of an image in the set follows approximately a binomial law, which can be approximated by the normal law, which satisfies the 68-95-99.7% rule. It means that approximately 0.3% of the counts will differ from the average 56 by more than 3 times the standard deviation, which means 3 * 6 = 18. So 99.7% of the counts will be between 56-18 and 56+18.

Of course, if you draw 900 sets of 150 images as you do, a few counts will be extremal.

More constructively, what do you suggest to satisfy the "at least 50 occurrences" requirement?
Larz60+ likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  List with equal repeatable data Ataman 6 1,536 Jun-13-2024, 07:30 PM
Last Post: Ataman
  Check if two matrix are equal and of not add the matrix to the list quest 3 1,791 Jul-10-2023, 02:41 AM
Last Post: deanhystad
  detect equal sequences in list flash77 17 5,470 Oct-28-2022, 06:38 AM
Last Post: flash77
  is there equal syntax to "dir /s /b" kucingkembar 2 1,848 Aug-16-2022, 08:26 AM
Last Post: kucingkembar
  How to draw the Probability Density Function (PDF) plot regardless of sampe size? amylopectin 3 8,050 Mar-02-2022, 09:34 PM
Last Post: Larz60+
  Can a variable equal 2 things? Extra 4 2,343 Jan-18-2022, 09:21 PM
Last Post: Extra
  finding probability of exceding certain threshold Staph 1 2,498 Dec-14-2019, 04:58 AM
Last Post: Larz60+
  Not equal a dictionary key value bazcurtis 2 2,565 Dec-11-2019, 11:15 PM
Last Post: bazcurtis
  Randomizing Color Output ammorgan 0 2,416 Dec-27-2018, 03:31 AM
Last Post: ammorgan
  Realized variance and daily probability distribution petergarylee 1 2,992 Jul-06-2018, 02:21 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020