Mar-07-2017, 05:57 PM
This is a problem I'm in the middle of solving, and because I don't want to spend forever on it, I'll probably brute force a solution.
Suppose there's a third party website which displays information within a radius (max 25 miles) of a given zip code. Suppose you want to scrape that website, for all the available info for the entire US. You know, so you can analyze it, or whatever.
The site in question isn't the fastest in the world, so just blasting it with thousands of zip codes will take... too long to run.
So the problem is then, how do you choose zip codes strategically to cover the entire US, without having too much overlap between the radii between those zip codes? This seems like basic geometry to me, but I'm bad at math, so...
What I'm working on now is basically pick a zipcode, then in each of the cardinal directions, the next zip code is as close to 10 miles away without going over. So there's a 15 mile overlap (in each direction, for each zip code), but there's no uncovered territory. So I'll need to dedupe the results from the external system. Is there a better way to do this, such that there's no uncovered territory and ALSO less zips involved?
For reference, here's five zip codes, each with a 25(ish) mile radius, near Seattle (you can see that there will be A LOT of requests this way...).
Suppose there's a third party website which displays information within a radius (max 25 miles) of a given zip code. Suppose you want to scrape that website, for all the available info for the entire US. You know, so you can analyze it, or whatever.
The site in question isn't the fastest in the world, so just blasting it with thousands of zip codes will take... too long to run.
So the problem is then, how do you choose zip codes strategically to cover the entire US, without having too much overlap between the radii between those zip codes? This seems like basic geometry to me, but I'm bad at math, so...
What I'm working on now is basically pick a zipcode, then in each of the cardinal directions, the next zip code is as close to 10 miles away without going over. So there's a 15 mile overlap (in each direction, for each zip code), but there's no uncovered territory. So I'll need to dedupe the results from the external system. Is there a better way to do this, such that there's no uncovered territory and ALSO less zips involved?
For reference, here's five zip codes, each with a 25(ish) mile radius, near Seattle (you can see that there will be A LOT of requests this way...).