Multiple start of script

dev1755 · (This post was last modified: Sep-22-2019, 07:51 PM by dev1755.)

I am trying to speed-up one simple search task using all cores of my server cpus. I have simple task search one simple string inside huge text datafile. I want to start for each value one process on each core. I spent some time to figure it out using something like search-lookup-table, what was designed like dictionary with keys string_data and status. My intention was to do something like that:

if some cpu-core is free
look inside search-lookup-table for first string_data with status=0
start search
after finish set status=1

My problem is that I didn't realize how to manage cpu cores and assign them work.
Any advise will be helpfull.

I have server with 64 cores (4 CPUs with 16-cores each) and running Debian Linux v9 with Python v3.7. Datafile is pure text file without structure, simply pure text about 10^6~10^12 lines and ~10^4 chars per line. It's something like "alive", couple times a day it's updated. I tried to split it into smaller files and run search on that smaller files, but after "update" it's useless.

woooee · (This post was last modified: Sep-22-2019, 08:56 PM by woooee.)

Since you are only looking for one record, search-lookup-table for first string_data with status=0, multiple cores will only help if you split the text file into multiple parts and pass a separate part to each process. So if the first process finds a match, then that is the closest to the beginning of the file. If the first process doesn't find anything, then you look at the results from the second process, etc. You will probably want to use a Manager dictionary, and pass it to each process. key = process number --> anything is found. https://pymotw.com/3/multiprocessing/com...ared-state

dev1755 · (This post was last modified: Sep-22-2019, 10:44 PM by dev1755.)

Thanks for reply. Maybe it will be wise to mention that string_data is list of strings, that means for each of string_data in that list I have to count of string_data existence and store all file positions, if any. It's f.e. search of 20 char long string in f.e. 100 bilion words and each match should return file position of 1st char. Each string in list is independent from previous or next one, each can be 1 char up to ~10^4 chars in row. It's some kind of knowledge/behavior/learning mapping. I had idea to do it in "passes", like 1st start will process first 63 strings "in row" and next pass should start up from 64-th string in string_data list (not carring about previous).
At the end of day it should be changes tracker on top of milions strings and their positions in file in time.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Using a script to open multiple shells?	SuchUmami	9	444	Apr-01-2024, 10:04 AM Last Post: Gribouillis
	Accessing same python script from multiple computers	bigrockcrasher	1	1,687	May-25-2022, 08:35 PM Last Post: Gribouillis
	readline.parse_and_bind() does not work in start-up script of Python interpreter	zzzhhh	0	1,520	Jan-18-2022, 11:05 AM Last Post: zzzhhh
	Running script on multiple files	Afrodizzyjack	1	2,505	May-14-2021, 10:49 PM Last Post: Yoriz
	Python: Automated Script to Read Multiple Files in Respective Matrices	Robotguy	7	4,189	Jul-03-2020, 01:34 AM Last Post: bowlofred
	Looking for help on making a script [no idea where to start]	Chietnemese	1	1,747	Jun-26-2020, 03:50 AM Last Post: Larz60+
	how to stop and start a script for 30 seconds	laspaul	9	7,630	Jan-16-2020, 02:13 PM Last Post: laspaul
	Running multiple script at the same time	LoganSulpizio	1	7,008	Dec-07-2019, 04:30 PM Last Post: Clunk_Head
	What's the difference b/w assigning start=None and start=" "	Madara	1	2,314	Aug-06-2018, 08:23 AM Last Post: buran
	win10 Service(script) start and stop	hwa_rang098tkd	0	2,458	Jun-21-2018, 07:42 PM Last Post: hwa_rang098tkd

Multiple start of script

User Panel Messages

Announcements