Python Forum

Full Version: Impact of words from sentence on popularity
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi, I have a table with few thousand of rows, in one column is a one or few sentences of joke, in second number of views. I would like to sort words which are mostly used in best jokes but not in bad one and reverse. How to do it in Python? Thank you
Quote:How to do it in Python?
Please show what have you tried?
Yes, code please :-D
Sure, I have to say, that I'm stucked just at the start now I'm able to get only whole sentences, and their occurences.
df = pd.read_csv('C:/Users/Adam/School/6th semester/project/dataset.csv')
df['text'].value_counts()
df_test = df.query('text == "test sentence"')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2827 entries, 0 to 2826
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   File           2827 non-null  object 
 1   text	        2827 non-null  object 
 3   views          2827 non-null  int64
I make some progress!

oneList = list(df['text'])
oneString = ' '.join(oneList)
allWords = oneString.lower().split()
count = Counter(allWords)
print(count)
now I'm able to get frequency of words, but unfortunately without the influence of popularity :/

Any ideas how to distinguish between popular and unpopular? Thanks!
Quote:Any ideas how to distinguish between popular and unpopular? Thanks!

I think you will need NLTK's frequency distribution functions or something similar.

Here's a few places to research:
http://www.nltk.org/api/nltk.html?highli...robability
https://python.gotrained.com/frequency-d...n-in-nltk/