Python Forum
Cloud computing advice needed
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Cloud computing advice needed
#1
I'm doing stock backtesting using Python - and I've pretty much maxed the computational abilities of my local machines. So I'm looking for some cloud computing advice.

Regarding my code, I had a deep dive this morning with a friend of mine who is a data scientist and very experienced in python - she basically told me you either need more hardware to do what you want to do, or write it in C. For right now, getting more hardware seems like the path of less resistance than doing all my stuff in C so....

So there are 3 roads I can go down:

1. Build my own high end custom rig at home with like 128 cores, which for now I'd want to avoid because this will cost thousands up front, and I need better validation of my ideas before I can justify spending thousands on it.

2. Get a VPS or Dedicated server. I can pay monthly here, and there are a lot of good options in the $100-$300/mo range.

3. Use Google Cloud, Azure, or AWS - I know this by far will have the most computing power, but based on some previous experiences with AWS I'm hesitant here because you can wind up with ridiculous bills really fast.

At this point, I'm looking for the most cost effective solution. Suggestions?
Reply
#2
Have you tried numba?

https://numba.pydata.org/
Reply
#3
Tried both numba and cython - so in the backtest model we're playing with, there are a number of different input parameters. When you start going through all the possible iterations of the input parameters... you end up with millions or sometimes tens of millions of times the backtest needs to be ran for EACH day. We are trying to figure out which input parameters are optimal, so searching for that needle in the haystack. And... the haystack just keeps on getting bigger because we need it to be statistically significant. The parameters themselves are very varied and different from each other, even for time series, it's not even, we are doing things like testing various time blocks of different length that also happen to be non-sequential. I try to use as many of the built-in python functions as possible to speed things up, and I have made considerable speed gains doing so, but it's just not enough. We're also trying to keep the parameters as narrow as possible to speed up processing, but this is a double edged sword as we have foudn some promising results from parameter combinations that we didn't expect at all, and if we weren't doing exhaustive searching we never would have found them. There is simply no way to go through all of the iterations of input parameters without multiple nested for loops, and that's the performance killer, those nested for loops. All of our stuff is ran in parallel using joblib. The only thing not done at this point is by parallelizing across multiple computers using something like dask distributed. I thought about doing that, but while I have a number of computers lying around I decided not to try (yet at least) since most of them are circa 2010-2015 so they are relatively slow. That's why my data scientist friend told me that I either need to get more computing power, or code everything in C where loops are fast.
Reply
#4
https://github.com/hosseinmoein/DataFrame

I just found that while researching - I don’t know much about it other than the last 30 minute reading about it, but if I go down the C/C++ route the fact that there is a C++ dataframe library makes it much, much more palatable.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Computing the distance between each pair of points Truman 11 4,239 Jun-20-2020, 01:15 PM
Last Post: Truman
  advice needed to construct queries in flask-sqlalchemy from mysql select pascale 0 1,773 Dec-08-2019, 07:03 AM
Last Post: pascale
  Converting days to years in loop while computing values across grid cells Lightning1800 2 2,653 May-15-2018, 08:44 PM
Last Post: Lightning1800
  Error in computing FFT operation raady07 1 4,271 Jan-18-2017, 08:30 AM
Last Post: j.crater

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020