Python Forum

Hi, I have a table with few thousand of rows, in one column is a one or few sentences of joke, in second number of views. I would like to sort words which are mostly used in best jokes but not in bad one and reverse. How to do it in Python? Thank you

Quote:How to do it in Python?

Please show what have you tried?

Yes, code please :-D

Sure, I have to say, that I'm stucked just at the start now I'm able to get only whole sentences, and their occurences.

df = pd.read_csv('C:/Users/Adam/School/6th semester/project/dataset.csv')
df['text'].value_counts()
df_test = df.query('text == "test sentence"')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2827 entries, 0 to 2826
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   File           2827 non-null  object 
 1   text	        2827 non-null  object 
 3   views          2827 non-null  int64

I make some progress!

oneList = list(df['text'])
oneString = ' '.join(oneList)
allWords = oneString.lower().split()
count = Counter(allWords)
print(count)

now I'm able to get frequency of words, but unfortunately without the influence of popularity :/

Any ideas how to distinguish between popular and unpopular? Thanks!

Quote:Any ideas how to distinguish between popular and unpopular? Thanks!

I think you will need NLTK's frequency distribution functions or something similar.

Here's a few places to research:
http://www.nltk.org/api/nltk.html?highli...robability
https://python.gotrained.com/frequency-d...n-in-nltk/

dvorak

Larz60+

DeaD_EyE

dvorak

dvorak

Larz60+