Mar-03-2019, 01:43 AM
Thank you. Still, don't have a completely clear picture. Why would it be a nested dictionary?
getting unique values and counting amounts
|
||||||||||||||||||
Mar-03-2019, 01:43 AM
Thank you. Still, don't have a completely clear picture. Why would it be a nested dictionary?
Mar-03-2019, 01:50 AM
Almost every dictionary that I create is nested. This is a generic dictionary function that I wrote for my own use, and I wanted it to be able to display any dictionary nested or not.
Mar-03-2019, 01:54 AM
So basically we don't really have a nested dictionary in this case, right?
Mar-03-2019, 01:59 AM
Correct.
I use this routine a lot when I'm scraping new sites because I usually build a dictionary witch is actually a partial sitemap of the site I'm scraping. It makes it easier to find my target areas.
Mar-05-2019, 12:17 AM
I'm trying to do something similar here:
when I remove read() I keep getting different sort of errors. If it would help I can put it here.Any idea how to fix this line?
Mar-05-2019, 01:23 AM
this line:
replace with:
See: https://www.nltk.org/ You can install it and play with it a bit. for example to get n grams, using your context code would look something like:
Now you can go a step further: get the frequency of each bigram in our corpus
Here's the tutorial for above code: https://www.kaggle.com/rtatman/tutorial-getting-n-grams
Using your first 3 lines of code only instead of my previous code that raised error gives n-grams with values 0. Not sure why is that so.
kaggle is a data science place. Can we consider web scraping a part of data science then? ![]() Will definitely take a look at NLTK stuff, although I don't think that I should spend too much time on this for now... And this code:
The mystery continues...what would happen if I would add some real functions cleanInput and getNgrams. Will work on that tomorrow after I study documentation on nltk.
NLTK is great, but can be a bit tricky at first. I use the O'reilly book'Natural Language Processing with Python' as a guide, but I'm sure there are better examples on the web now as I purchased the book back in 2013, and it was published 2009. If you wish, I'll take a look and see what's available now, and if someone else is reading this post, and knows where to look, that would also help.
found this snippet:
Mar-06-2019, 01:57 PM
Thank you, no need look for anything newer as I'm not planning to focus that much on this field...at least not for now.
Will check the snippet.
Mar-07-2019, 12:49 AM
Any idea why is this argument invalid?
| ||||||||||||||||||
|