Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Interesting Datasets
#1
Free public databases.

Everything (Literally):
  • Archive.org - This site has Multi millions (if not billions by now) historical documents, music, movies (downloadable), Civil war records, books. You name it. Just an enormous undertaking to preserve data from historical archives all over the world.
Output:
          When searching for information, you may try their search engine, but I find it quite lacking. Instead use google and qualify the site to archive.org           Example - Suppose you're into genealogy, and looking for a book written in the 1800's named 'Vital records of Beverly, Massachusetts, to the end of the year 1849'           search on google: [i]'[/i]Vital statistics Beverly Massachusetts site:archive.org'  Note the site qualifier. All results will be for archive.org. Click on the           link: 'Vital records of Beverly, Massachusetts, to the end of the year 1849 ...'           Note the following on archive.org: [list] [*]A reader will be shown, and you can look up entries in the book on-line. [*]Below that, you can download the book in any one of the following formats: ABBYY.gz, B/W PDF, DAISY, EPUB, FULL TEXT, KINDLE, PDF, Single page JP2 tar file, Single page Processes JP2 zip file,  or Torrent [*]Information on the book itself like -> where the original is located, copyright information, etc. [/list]       You can also download all of the Charlie Chaplin movies, and find music from the Grateful dead.       I just can't say enough about this monumental site.
Geographical Mapping:
  • Census Tiger Files - Geography is central to the work of the Census Bureau, providing the framework for survey design, sample selection, data collection, tabulation, and dissemination.  Geography provides meaning and context to statistical data. ESRI format shapefiles
Nutrition:
Patents:
  • Google US Patent office bulk download - Google and the USPTO entered into an agreement in 2010 to make USPTO bulk data available at no charge. The USPTO now provides access to the data through its Bulk Data Storage System, and as a result Google's data mirror is no longer necessary.
  • European Patent Register - Some downloadable files, lookup available on line.
Securities:
  • ASXlisted - Austrailian financial market exchanges company list
  • mfundslist.txt - US Mutual Fund List
  • otclist.txt - US over the counter companies
    US Companies Alphabetical List - The link (which follows) is incomplete. You must add the letter (upper case) that you which to gethttp://www.nasdaq.com/screening/companies-by-name.aspx?letter=
  • finra_eod.txt: - The OATS Reportable Security Daily List for End of Day (retrieve after 6:00 p.m. EST)
  • finra_sod.txt: -  The OATS Reportable Security Daily List for Start of Day (retrieve after 6:30 a.m. EST )
  • AMEX companies by index - US American Stock Exchange Company List.
Consumer Safety:
  • SaferProducts.gov full download: Reports of harm reported to the Consumer Product Safety Commission (>30k incidents, csv format, plus recalls). Maybe not that interesting, but overseeing this database is what I do in my day job.
Genetics:
Music:
  • Million Song Dataset - A freely-available collection of audio features and metadata for a million contemporary popular music tracks.
General:

Please feel free to add to this post, just follow these simple rules:
  • Make sure the dataset's not already listed here
  • check out the link to make sure it's valid
Reply
#2
SaferProducts.gov full download: Reports of harm reported to the Consumer Product Safety Commission (>30k incidents, csv format, plus recalls). Maybe not that interesting, but overseeing this database is what I do in my day job.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020