Python Forum

Full Version: Opinion: how should my scripts cache web download files?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have a script that downloads files from the web. To be a good web citizen, I want to keep a local cache and only re-fetch from the internet if the cache is too old.

I cannot rely on there being a local caching web server.

This is especially important while testing my script, when I might run it dozens of times in a row. I know that the server I am downloading from may block too many requests.

Speaking of requests, for reasons I can only use the stdlib, so no third part solutions, sorry Sad

So my idea is to look for a cached file in a known location:
  • If the file doesn't exist, download from the web.
  • If the file exists, but is older than some amount of time, say X minutes, download from the web.
  • If the file exists, and is younger than X minutes, then use the cached file.

The cached file will have to persist from one run of the script to the next, but it doesn't have to survive rebooting the computer.

Two questions:

  1. Where should I put the cache? Platform independent answers preferred.
  2. What is a reasonable value for X minutes? I'm thinking 15 minutes.

Thanks in advance.