Apr-24-2017, 06:56 PM
Just trying to figure out how to improve the efficiency on a routine that uses Scrapy for fetching URLs.
Although caching is in place, the more the URLs the longer the time I need to complete the operation and for large list of URLs the processing time is becoming unacceptable.
Multithreading has introduced some benefits, but I still far away from optimal performance.
What would you do to improve things? I initially thought at using the memoization technique, but being something new to me, I'm not sure whether benefits could be seen. The idea behind is that being memoization a mechanism able to store both the call and the computed result, I could see benefits in not accessing to the disk for checking whether a cached URL has been previously processed.
What are your thoughts? Any other advice is really appreciated.What are your thoughts? Any other advice is really appreciated.
Although caching is in place, the more the URLs the longer the time I need to complete the operation and for large list of URLs the processing time is becoming unacceptable.
Multithreading has introduced some benefits, but I still far away from optimal performance.
What would you do to improve things? I initially thought at using the memoization technique, but being something new to me, I'm not sure whether benefits could be seen. The idea behind is that being memoization a mechanism able to store both the call and the computed result, I could see benefits in not accessing to the disk for checking whether a cached URL has been previously processed.
What are your thoughts? Any other advice is really appreciated.What are your thoughts? Any other advice is really appreciated.