I am attempting to scrape a particular header from a collection of several thousand pages across multiple domains from behind a rather slow https proxy. I am attempting to optimize as best I can. So far, I'm using requests.head() for the actual connection, and multi-threading it to mitigate the proxy randomly not responding for a few seconds. My next plan is to try and leverage requests.Session to see if that makes the proxy happier. The issue is, I'm not sure how to safely thread that. I can't believe that Session is thread safe, but maybe I could assign a Session object per thread? How would I do that?
Am I massively overcomplicating this whole thing, and there's a better way? Opinions, please.
Am I massively overcomplicating this whole thing, and there's a better way? Opinions, please.