Python Forum
US Census site - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Forum & Off Topic (https://python-forum.io/forum-23.html)
+--- Forum: Bar (https://python-forum.io/forum-27.html)
+--- Thread: US Census site (/thread-1531.html)

Pages: 1 2


US Census site - Larz60+ - Jan-11-2017

Hello,

I recently (last night) downloaded a bunch of data files from the US census public data site using a bot.
I thought I had inserted a delay between downloads, but when I just took a look, I guess i forgot to do so.

Now when I try to access the site I'm getting an error:

Quote:Access Denied

You don't have permission to access "http://www.census.gov/geo/maps-data/data/cbf/cbf_counties.html" on this server.

I'd appreciate it if someone could try this url and verify that I have been (at least temporarily) banned from the site, or if it is down

Thanks


RE: US Census site - metulburr - Jan-11-2017

i see this

Quote:I'd appreciate it if someone could try this url and verify that I have been (at least temporarily) banned from the site, or if it is down
you can use a proxy to check from a different IP, alternatively you can check one of the down sites
http://downforeveryoneorjustme.com/http://www.census.gov/geo/maps-data/data/cbf/cbf_counties.html


RE: US Census site - Larz60+ - Jan-11-2017

Thanks! Hope it's not for long.


RE: US Census site - Kebap - Jan-11-2017

I for one get a new IP every 24 hours, so that ban would be temporary at best


RE: US Census site - wavic - Jan-11-2017

How much is a some normal delay for such a tasks?


RE: US Census site - Skaperen - Jan-18-2017

how big are these files?
how many of them are there?
what do they say about download policy?
is your IP in USA?


RE: US Census site - Larz60+ - Jan-18-2017

There are several sets of data.
The one that I was downloading consists of about 64 files
Most are under 200 MB each, 4 about 2GB each and the largest about 9GB
The 9GB one is the one that got me expelled. (my punishment is over now)

IP is USA
never did find any limitations on download, not in robots.txt file


RE: US Census site - Skaperen - Jan-18-2017

(Jan-18-2017, 03:26 AM)Larz60+ Wrote: There are several sets of data.
The one that I was downloading consists of about 64 files
Most are under 200 MB each, 4 about 2GB each and the largest about 9GB
The 9GB one is the one that got me expelled. (my punishment is over now)

IP is USA
never did find any limitations on download, not in robots.txt file

what is the URL of the 9 GB one?

sign up at AWS.  each new EC2 instance gets a new public IP.  make a py script that waits to :10 and starts the download, then waits to :05 of the next hour, launches a new instance and halts (with virtual power off) this instance.  the new instance selects the next file.


RE: US Census site - Larz60+ - Jan-18-2017

I successfully downloaded the file before I was temporarily banned
It was  the edges file from Tiger 2016 Geo database located at https://www2.census.gov/geo/tiger/TGRGDB16/
It's actually 7.8GB name tlgdb_2016_a_us_edges.gdb.zip

Googling the file name will get you the records layout document


RE: US Census site - Skaperen - Jan-18-2017

i already see 2 CNAMES in DNS.  are there any redirects?  what IP does the download happen on?  23.217.21.225 or some other?