Python Forum

Full Version: US Census site
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Hello,

I recently (last night) downloaded a bunch of data files from the US census public data site using a bot.
I thought I had inserted a delay between downloads, but when I just took a look, I guess i forgot to do so.

Now when I try to access the site I'm getting an error:

Quote:Access Denied

You don't have permission to access "http://www.census.gov/geo/maps-data/data/cbf/cbf_counties.html" on this server.

I'd appreciate it if someone could try this url and verify that I have been (at least temporarily) banned from the site, or if it is down

Thanks
i see this

Quote:I'd appreciate it if someone could try this url and verify that I have been (at least temporarily) banned from the site, or if it is down
you can use a proxy to check from a different IP, alternatively you can check one of the down sites
http://downforeveryoneorjustme.com/http:...nties.html
Thanks! Hope it's not for long.
I for one get a new IP every 24 hours, so that ban would be temporary at best
How much is a some normal delay for such a tasks?
how big are these files?
how many of them are there?
what do they say about download policy?
is your IP in USA?
There are several sets of data.
The one that I was downloading consists of about 64 files
Most are under 200 MB each, 4 about 2GB each and the largest about 9GB
The 9GB one is the one that got me expelled. (my punishment is over now)

IP is USA
never did find any limitations on download, not in robots.txt file
(Jan-18-2017, 03:26 AM)Larz60+ Wrote: [ -> ]There are several sets of data.
The one that I was downloading consists of about 64 files
Most are under 200 MB each, 4 about 2GB each and the largest about 9GB
The 9GB one is the one that got me expelled. (my punishment is over now)

IP is USA
never did find any limitations on download, not in robots.txt file

what is the URL of the 9 GB one?

sign up at AWS.  each new EC2 instance gets a new public IP.  make a py script that waits to :10 and starts the download, then waits to :05 of the next hour, launches a new instance and halts (with virtual power off) this instance.  the new instance selects the next file.
I successfully downloaded the file before I was temporarily banned
It was  the edges file from Tiger 2016 Geo database located at https://www2.census.gov/geo/tiger/TGRGDB16/
It's actually 7.8GB name tlgdb_2016_a_us_edges.gdb.zip

Googling the file name will get you the records layout document
i already see 2 CNAMES in DNS.  are there any redirects?  what IP does the download happen on?  23.217.21.225 or some other?
Pages: 1 2