Python Forum

Full Version: Corpora catalof for NLTK
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I've been playing with NLTK (again) today. There's quite a good list of available corpora available here
what I have been looking for is how to get a copy of that list programmatically, without having to
scrape the page.

Does anyone know how to do this?

If not, expect to see a scraper for it soon in snippets.


catalof is like a loaf of bread, only made with cata's
I am posting under snippets some code that creates a json file from the Corpora website.
same title (with spelling correction)