Python Forum
M4 mini cluster to crunch & munch large datasets
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
M4 mini cluster to crunch & munch large datasets
#1
I have been using an M2 Mac mini for a year and a half, my mean husband took it away from me and bought me 3 M4 minis to make a cluster! I know the horrors, please choke back your tears.

I think I am fairly good at writing the code, but I have NEVER distributed processed or know whatever it is even called. https://python-forum.io/images/smilies/icon_eek.gif

These are base models with a 2T chip on circuit board. I have run my model with a single mini and it had not finished by the morning, so the goal now is to learn how to make my model divide and be processed by the cluster.

The setup is Main mini 16Gb/2T with Thunderbolt 4 cable out the back to each of the other 2 minis 16Gb/256Gb and those minis connected to each other for redundancy.

Does anyone know the code, or where I can something that will start me down this path?

Thanks in advance. https://python-forum.io/images/smilies/heart.png
Reply
#2
I have been fascinated with Grok (AI on X), so asked the following query:

Quote:setup is Main mini 16Gb/2T with Thunderbolt 4 cable out the back to each of the other 2 minis 16Gb/256Gb and those minis connected to each other for redundancy, would like to distribute processing of application across all three, python

To use Grok, you need an account on 'X" (twitter), a free one will do. Click on 'Grok' in side panel and ask your question.
Here's the response from your question:
-------------------------------------
To distribute processing of a Python application across three Mac Minis (one main with 16GB RAM and 2TB storage, and two others with 16GB RAM and 256GB storage) connected via Thunderbolt 4, you’ll need to set up a distributed computing system. Thunderbolt 4 provides high-speed connectivity (up to 40 Gbps), which is excellent for low-latency communication between the machines. Here's a step-by-step approach to achieve this using Python:

Hardware Setup

  1. Main Mini (16GB/2TB):
    • Acts as the "controller" or primary node.
    • Connected via Thunderbolt 4 cables to the two secondary Minis.
  2. Secondary Minis (16GB/256GB):
    • Connected to the Main Mini via Thunderbolt 4.
    • Optionally connected to each other for redundancy (though this is less critical for processing distribution and more for data replication or failover).
  3. Networking:
    • Thunderbolt 4 supports networking via Thunderbolt Bridge. Enable this to allow the Minis to communicate over IP.
      • On each Mac: Go to System Settings > Network > Thunderbolt Bridge > Enable and assign static IPs (e.g., 192.168.2.1 for Main, 192.168.2.2 for Mini 1, 192.168.2.3 for Mini 2).

Software Setup

To distribute processing, you’ll need a framework in Python that supports parallelism across multiple machines. Here are two solid options:
  1. Option 1: Use multiprocessing with dask or ray
    • Dask: A flexible library for parallel computing in Python that can scale from a single machine to a cluster.
    • Ray: A distributed computing framework optimized for machine learning and general-purpose tasks.
  2. Option 2: Custom Solution with socket or ZeroMQ
    • f you want full control, you can write a custom client-server model using Python’s socket library or ZeroMQ for communication.
For simplicity and robustness, I recommend Dask, as it’s well-suited for distributing Python workloads across multiple machines.
-------------------------------------
You will then be given step by step instructions on how to implement each method, including python code.
I'd recommend connecting to 'X', starting 'Grok', and asking above query, verbatim.
I think you will be pleasantly surprised.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  updating cluster of elements based on the max value of distance alex80 0 2,056 Oct-02-2020, 11:11 AM
Last Post: alex80
  using silhouette score for each sample of an array with each cluster alex80 1 3,658 Sep-25-2020, 11:35 PM
Last Post: scidam
  Fastest way to subtract elements of datasets of HDF5 file? Robotguy 3 3,796 Aug-01-2020, 11:48 PM
Last Post: scidam
  Datasets lErn1324 1 1,996 Jul-17-2020, 06:29 PM
Last Post: Larz60+
  How to cluster dataset neha_garg 0 2,392 Nov-14-2019, 07:38 AM
Last Post: neha_garg
  Datasets of grammatically uncommon sentences? regstuff 3 2,913 Nov-03-2019, 07:02 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020