Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Multiple start of script
#1
I am trying to speed-up one simple search task using all cores of my server cpus. I have simple task search one simple string inside huge text datafile. I want to start for each value one process on each core. I spent some time to figure it out using something like search-lookup-table, what was designed like dictionary with keys string_data and status. My intention was to do something like that:

if some cpu-core is free
look inside search-lookup-table for first string_data with status=0
start search
after finish set status=1

My problem is that I didn't realize how to manage cpu cores and assign them work.
Any advise will be helpfull.

I have server with 64 cores (4 CPUs with 16-cores each) and running Debian Linux v9 with Python v3.7. Datafile is pure text file without structure, simply pure text about 10^6~10^12 lines and ~10^4 chars per line. It's something like "alive", couple times a day it's updated. I tried to split it into smaller files and run search on that smaller files, but after "update" it's useless.
Reply
#2
Since you are only looking for one record, search-lookup-table for first string_data with status=0, multiple cores will only help if you split the text file into multiple parts and pass a separate part to each process. So if the first process finds a match, then that is the closest to the beginning of the file. If the first process doesn't find anything, then you look at the results from the second process, etc. You will probably want to use a Manager dictionary, and pass it to each process. key = process number --> anything is found. https://pymotw.com/3/multiprocessing/com...ared-state
Reply
#3
Thanks for reply. Maybe it will be wise to mention that string_data is list of strings, that means for each of string_data in that list I have to count of string_data existence and store all file positions, if any. It's f.e. search of 20 char long string in f.e. 100 bilion words and each match should return file position of 1st char. Each string in list is independent from previous or next one, each can be 1 char up to ~10^4 chars in row. It's some kind of knowledge/behavior/learning mapping. I had idea to do it in "passes", like 1st start will process first 63 strings "in row" and next pass should start up from 64-th string in string_data list (not carring about previous).
At the end of day it should be changes tracker on top of milions strings and their positions in file in time.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Using a script to open multiple shells? SuchUmami 9 444 Apr-01-2024, 10:04 AM
Last Post: Gribouillis
  Accessing same python script from multiple computers bigrockcrasher 1 1,687 May-25-2022, 08:35 PM
Last Post: Gribouillis
  readline.parse_and_bind() does not work in start-up script of Python interpreter zzzhhh 0 1,520 Jan-18-2022, 11:05 AM
Last Post: zzzhhh
  Running script on multiple files Afrodizzyjack 1 2,505 May-14-2021, 10:49 PM
Last Post: Yoriz
  Python: Automated Script to Read Multiple Files in Respective Matrices Robotguy 7 4,189 Jul-03-2020, 01:34 AM
Last Post: bowlofred
  Looking for help on making a script [no idea where to start] Chietnemese 1 1,747 Jun-26-2020, 03:50 AM
Last Post: Larz60+
  how to stop and start a script for 30 seconds laspaul 9 7,630 Jan-16-2020, 02:13 PM
Last Post: laspaul
  Running multiple script at the same time LoganSulpizio 1 7,008 Dec-07-2019, 04:30 PM
Last Post: Clunk_Head
  What's the difference b/w assigning start=None and start=" " Madara 1 2,314 Aug-06-2018, 08:23 AM
Last Post: buran
  win10 Service(script) start and stop hwa_rang098tkd 0 2,458 Jun-21-2018, 07:42 PM
Last Post: hwa_rang098tkd

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020