Mar-17-2021, 03:39 PM
Thanks for all your inputs.
I also expected a 100x performance degradation at most. The current code is partially optimized using Numba, the original version took days to compute. The code is a simulation created by a physicist originally, so time is simulated in very small steps over the whole simulation function, which I know is not where python shines. Still, the code is fairly pythonic where it can be (heavy use of numpy build in functions and vectorization).
I did not have a go at pypy, but I did try Cython. The optimisation gain was similar to the one I got from Numba, still an order of magnitude less than what I would need.
Maybe the question could be more generally rephrased to: Is it possible for python to achieve similar (within 10x) performances than C++ for simulations with small time steps. And if so, what should one be aware of.
I profiled the code and identify the main functions responsible for the lack of performance. However I could not understand, yet, why these functions don't make use of the available CPU power despite parallelization and such.
I also expected a 100x performance degradation at most. The current code is partially optimized using Numba, the original version took days to compute. The code is a simulation created by a physicist originally, so time is simulated in very small steps over the whole simulation function, which I know is not where python shines. Still, the code is fairly pythonic where it can be (heavy use of numpy build in functions and vectorization).
I did not have a go at pypy, but I did try Cython. The optimisation gain was similar to the one I got from Numba, still an order of magnitude less than what I would need.
Maybe the question could be more generally rephrased to: Is it possible for python to achieve similar (within 10x) performances than C++ for simulations with small time steps. And if so, what should one be aware of.
I profiled the code and identify the main functions responsible for the lack of performance. However I could not understand, yet, why these functions don't make use of the available CPU power despite parallelization and such.