Python Forum
parallel merge sort algorithm with threads
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
parallel merge sort algorithm with threads
#1
Question 
Hello everyone!,
I have an assignment that i need to do and after countless of trails i cant figure it out and would be grateful for help.
Of course I tried it with chat gpt or other ai but i keep feeling their answers is wrong.

I need to design a parallel version of the merge sort algorithm that operates using threads in the following manner:
At the first recursion level (merging two arrays of half the size), a single thread is used.
At the second recursion level (merging two arrays of a quarter of the size), two threads are used.
At the third recursion level (merging four arrays of one-eighth the size), four threads are used.
And so forth.
I need to use the threading library and synchronization mechanisms like mutex and semaphore as needed. The sorting should be done in ascending order, and you are required to sort a list of integers.
Thanks in advance,
yang.

This is what i have done :
import threading
import random
import math

# Semaphore to control the number of active threads
thread_semaphore = threading.Semaphore()  # Start with 1 thread

# Function to merge two halves
def merge(arr, left, middle, right):
    n1 = middle - left + 1
    n2 = right - middle
    L = arr[left:left + n1]
    R = arr[middle + 1:middle + 1 + n2]
    i = j = 0
    k = left
    while i < n1 and j < n2:
        if L[i] <= R[j]:
            arr[k] = L[i]
            i += 1
        else:
            arr[k] = R[j]
            j += 1
        k += 1
    while i < n1:
        arr[k] = L[i]
        i += 1
        k += 1
    while j < n2:
        arr[k] = R[j]
        j += 1
        k += 1

# Function to implement the merge sort
def merge_sort(arr, left, right):
    if left < right:
        middle = (left + right) // 2
        
        # Try to acquire the semaphore
        if thread_semaphore.acquire(blocking=False):
            try:
                # Use threads for sorting
                left_thread = threading.Thread(target=merge_sort, args=(arr, left, middle))
                right_thread = threading.Thread(target=merge_sort, args=(arr, middle + 1, right))
                left_thread.start()
                right_thread.start()
                left_thread.join()
                right_thread.join()
            finally:
                # Release the semaphore
                thread_semaphore.release()
        else:
            # If semaphore can't be acquired, sort sequentially
            merge_sort(arr, left, middle)
            merge_sort(arr, middle + 1, right)
        
        # Merge the results
        merge(arr, left, middle, right)

# Helper function to start the merge sort
def threaded_merge_sort(arr):
    n = len(arr)
    # Set the initial number of threads based on array size
    global thread_semaphore
    thread_semaphore = threading.Semaphore(10)
    merge_sort(arr, 0, n - 1)

# Example usage with random lists
if __name__ == "__main__":
    sizes = [32, 100,300, 1000]
    for size in sizes:
        arr = [random.randint(1, 1000) for _ in range(size)]
        print(f"Original array of size {size}:", arr)
        threaded_merge_sort(arr)
        print(f"Sorted array of size {size}:", arr)
        print('-' * 60)
Reply
#2
Your sort is correct, if not easily readable. Some of the indexing is really confusing. If you don't subtract 1 from len(arr) at the start, you don't need to add 1 in merge. But when I pull all the threading, the results are correct. The problem is not the threading either, though that can be improved. The problem is the semaphore. There is no need for semaphores or mutexes. Let's dissect a sort of [5, 4, 3, 2, 1]:
split thread level
1: [5, 4], [3, 2, 1]
2: ([5], [4]), ([3], [2, 1])
3: ([2], [1])

merge thread level
3: [1, 2]
2: [4, 5], [1, 2, 3]
1: [1, 2, 3, 4, 5]

Notice that none of the left sides or right sides overlap. There is no need to be concerned about the order that the threads at any particular level execute. The only concern each thread must wait for the left and right thread to finish before it starts merging. You are already doing that.

Try removing the semaphore code. I tested it and sorting worked fine for me.

A threaded sort will run slower than an unthreaded sort. Threading allows a task to run without being blocked by another task. This results in a performance boost if one of the tasks is waiting for a resource, but that doesn't happen in sorting.

I did some timing, and these are the results for different sized arrays.
Output:
size, time in seconds. 10 0.004050016403198242 100 0.02199411392211914 1000 0.2438526153564453 10000 4.914689779281616
I removed the threading and ran the test again.
Output:
size, time in seconds. 10 0.0 100 0.0 1000 0.0020036697387695312 10000 0.02199864387512207
A threaded sort spends most of the time making threads, not sorting.
A merge sort could get a boost from multi-processing, but not at the level of granularity mentioned in your post. A multi-processor merge sort would divide arr into evenly sized chunks to be sorted in separate threads, then merge the results of those sorts back into one array.
Reply
#3
Thank you!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Photo a.sort() == b.sort() all the time 3lnyn0 1 1,909 Apr-19-2022, 06:50 PM
Last Post: Gribouillis
  Why won't my merge sort will not work? ggg 4 3,399 Feb-03-2021, 05:24 PM
Last Post: deanhystad
  Insertion sort algorithm courtesy of YouTuber Joe James Drone4four 3 2,928 Dec-07-2020, 02:11 PM
Last Post: perfringo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020