Python Forum
Parallelism with return values
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parallelism with return values
#1
Hey everybody,

I have some trouble finding an elegant solution to my parallelism problem. The task is quite simple, I get data, I process data, I output the result. This runs forever, or at least as long as the program is meant to run. Since this is the sole purpose of the program (and in fact, the entire rasp pi this is running on) I don't mind that this completely uses all CPU power, I even want that.

The interesting part is, the processing step is largely unchanging. Basically some values are computed (based on the data) and then the final result is computed with the input and the params. Now these two steps differ in runtime a lot, the first one taking about 5secs, whereas the second one only around 0.05s. Luckily though, the are params almost unchanging, since the input data always has the same characteristics.
So what I obviously do, is calculating the params once on the initial data and then only applying them in the loop. However, what I wanna do to account for possible drifts in the params, is recalculating them periodically.

I read a lot about parallelism, concurrency, asyncio. While I'd love a solution with asyncio, I don't see how non-preemptive scheduling helps with such a CPU-bound task (correct me if I'm wrong). Another simple idea is, to just run the recalculation part in a separate thread. That works fine, however due to the whole GIL and threading problem, this would slow down my program too much (I need the main processing to run in about 0.2s max). So I landed at multiprocessing and I found a working solution, yet I find it very ugly.

The main problem being, that the computation of the params actually has to return the result to the main process. My approach is to check in the main loop if a recalculation takes place, if not, start a new daemon process. Since I can't just join that process in the main loop, I create a thread which joins the process and then updates the parameters. As if that doesn't sound complicated enough, the question is how to transfer the result. Right now I'm using the Value Object from the multiprocessing module, but this supports only basic data types or custom ctypes structures. My example code is the following (the sleeps are just to slow everything a little down):

Now, thanks for reading this far, my questions are the following: Is this the right approach? Or is there a better or more elegant way to get this done regarding the concurrency? And second, what about the result communication? The example works, but that's just a single integer, I have multiple ints, a numpy array and some more stuff.

Thanks again,
Plexian
Reply
#2
You might be able to make use of
https://docs.python.org/3/library/functo...ools.cache
or
https://docs.python.org/3/library/functo....lru_cache
Reply
#3
I'm aware of memoization, but I don't see how that helps me in my scenario.

This caching helps preventing calling the functions body again, when the input parameters are the same as before. But in my case only the output is the same.
Imagine my function computes the determinant of a matrix. All the input matrices vary and are most likely never repeating, but I know that their determinant is almost constant. So I can allow myself to compute it less often, but still want to recompute it periodically to update the value in case it changes a bit.
Reply
#4
I didn't read all of your overly long post, so to answer the title, use a multiprocessing Manager list or dictionary to store the return values. https://pymotw.com/3/multiprocessing/com...ared-state I would send a Manager object to each Process, instead of the global background_running. Be aware that you can check the progress with current_process() and isalive() [also at the above website] possibly in a while in the "main" program, although I don't understand enough to make a definite suggestion.
Reply
#5
Thanks for the hint! I actually stumbled across the manager as well, but as far as I understand it, this will create a server client architecture and send values across, which is concerning from a performance perspective, which I sadly have to care about. Nonetheless I'll have a look into that and see how it goes.

I was able to make it work by using the Pipe object (mentioned in your post as well), but this uses pickle for serialization, which seems rather slow as well.

As far as I understand it, to achieve actual sharing of memory (pointer to same space in memory, no copying), I have to use the Value Object and pack my stuff into that by defining my own ctypes structure. Can somebody confirm that or maybe have a different idea?

And regarding my entire post, am I correct when saying that multiprocessing is the only way to go?
Reply
#6
The best way to speed things up is improve your algorithm. 100-1 or even 1000-1 speed improvements are not at all uncommon. Unless you have massive parallelism the speed improvement from multi-processing is 2-1, 3-1 etc. If you can't think of any way to improve the algorithm the next thing I would look into is vectorization. Can you use Python packages that do most of their work in highly optimized C++ code, something like numpy?
Reply
#7
How big is the data and how much time would it take to read from a file?

Can you run the updater thread as a separate process and have it periodically update a file? Then the fast loop just has to check periodically if the file has been updated and read in the params.
Reply
#8
@deanhystad
Good idea. Right now the algorithm used for updating is far from optimized and only uses numpy in some areas (and I'm not even sure if it's used correctly).
However, this part does not come from me so right now I haven't looked at any optimization of it. To be honest, I doubt I can run fast enough to fulfill my needs of running in like a 10th of a second, so I probably need to find a solution for this parallelism anyway. I still hope you are right and the optimization yields way better results than I expect, maybe I'll have to come back to the specifics of the algorithm in a future thread ;)

@bowlofred
Interesting idea, I didn't think of using the file system since I thought it's slower than using pickle in all cases. I'll have a look into that, the file size shouldn't matter too much, I'm just a little unsure how to assure exclusive access to the file handlers. When I get around to check that, I'll report back, thanks!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Need to return 2 values from 1 DF that equals another DF cubangt 5 593 Oct-21-2023, 02:45 PM
Last Post: deanhystad
  [Solved]Return values from npyscreen Extra 2 1,093 Oct-09-2022, 07:19 PM
Last Post: Extra
  please help with classes and return values jamie_01 5 1,751 Jan-17-2022, 02:11 AM
Last Post: menator01
  Need to parse a list of boolean columns inside a list and return true values Python84 4 2,037 Jan-09-2022, 02:39 AM
Last Post: Python84
  Function - Return multiple values tester_V 10 4,320 Jun-02-2021, 05:34 AM
Last Post: tester_V
  Function to return list of all the INDEX values of a defined ndarray? pjfarley3 2 1,919 Jul-10-2020, 04:51 AM
Last Post: pjfarley3
  What is the best way to return these 4 integer values? Pedroski55 4 2,491 Apr-13-2020, 09:54 PM
Last Post: Pedroski55
  Return values for use outside of function willowman 1 1,653 Apr-13-2020, 07:00 AM
Last Post: buran
  Return all Values which can divided by 9 lastyle 2 1,800 Mar-16-2020, 09:22 PM
Last Post: lastyle
  Custom Function to Return Database Values rm78 0 1,765 Sep-05-2019, 01:01 PM
Last Post: rm78

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020