how often does a word occur in this column?

Jack_Sparrow · Jun-10-2018, 01:50 PM

**Larz60+** · Jun-10-2018, 03:19 PM

Use collections:

from collections import Counter


data = 'Genres\nAction|Adventure\nAction|Adventure|Animation\nAction|Adventure|Animation|Comedy|Drama\n' \
    'Action|Adventure|Animation|Comedy|Family\nAction|Adventure|Animation|Drama|Family\n' \
    'Action|Adventure|Animation|Family\nAction|Adventure|Animation|Family|Fantasy\n' \
    'Action|Adventure|Animation|Family|Mystery\nAction|Adventure|Animation|Family|Science Fiction\n' \
    'Action|Adventure|Animation|Fantasy\nAction|Adventure|Animation|Fantasy|Horror\n' \
    'Action|Adventure|Animation|Fantasy|Science Fiction\n'

data_list = data.strip().split('|')
print(Counter(data_list).most_common(1))

Results:

Output:
[('Adventure', 11)]

ljmetzger · (This post was last modified: Jun-10-2018, 05:56 PM by ljmetzger.)

Great answer @Lars60+. I thnk the newlines should be replaced by vertical bars before parsing (see 'Adventure\n' in line one).

from collections import Counter

data = 'Genres\nAction|Adventure\nAction|Adventure|Animation\nAction|Adventure|Animation|Comedy|Drama\n' \
    'Action|Adventure|Animation|Comedy|Family\nAction|Adventure|Animation|Drama|Family\n' \
    'Action|Adventure|Animation|Family\nAction|Adventure|Animation|Family|Fantasy\n' \
    'Action|Adventure|Animation|Family|Mystery\nAction|Adventure|Animation|Family|Science Fiction\n' \
    'Action|Adventure|Animation|Fantasy\nAction|Adventure|Animation|Fantasy|Horror\n' \
    'Action|Adventure|Animation|Fantasy|Science Fiction\n'
    
data_list = data.replace('\n', '|')
data_list = data_list.strip().split('|')
print(Counter(data_list).most_common(1))

Output:
[('Action', 12)]

Lewis

volcano63 · (This post was last modified: Jun-10-2018, 07:38 PM by volcano63.)

(Jun-10-2018, 03:19 PM)Larz60+ Wrote: Use collections:

data = 'Genres\nAction|Adventure\nAction|Adventure|Animation\nAction|Adventure|Animation|Comedy|Drama\n' \
    'Action|Adventure|Animation|Comedy|Family\nAction|Adventure|Animation|Drama|Family\n' \
    'Action|Adventure|Animation|Family\nAction|Adventure|Animation|Family|Fantasy\n' \
    'Action|Adventure|Animation|Family|Mystery\nAction|Adventure|Animation|Family|Science Fiction\n' \
    'Action|Adventure|Animation|Fantasy\nAction|Adventure|Animation|Fantasy|Horror\n' \
    'Action|Adventure|Animation|Fantasy|Science Fiction\n'

data_list = data.strip().split('|')
print(Counter(data_list).most_common(1))

Results:

Output:
[('Adventure', 11)]

With \n in the middle, the Counter will produce a wrong result

data_list = data.strip().replace('\n', '|').split('|')
print(Counter(data_list).most_common(1))

The right result -

Output:
[('Action', 12)]

But I have a nagging suspicion that the column in OP is in DataFrame

Ooops, missed the post above

**Larz60+** · Jun-10-2018, 07:49 PM

yup, they need to be removed

***snippsat*** · (This post was last modified: Jun-10-2018, 08:39 PM by snippsat.)

(Jun-10-2018, 07:38 PM)volcano63 Wrote: But I have a nagging suspicion that the column in OP is in DataFrame

Yes,i guess this part is left this out and just assume that all should know that this is for Pandas.
@Jack_Sparrow not all Python user use Pandas,and post runnable code like sample data that get read in and making a working DataFrame.

volcano63 · Jun-10-2018, 08:52 PM

(Jun-10-2018, 08:39 PM)snippsat Wrote:
(Jun-10-2018, 07:38 PM)volcano63 Wrote: But I have a nagging suspicion that the column in OP is in DataFrame
Yes,i guess this part is left this out and just assume that all should know that this is for Pandas.
@Jack_Sparrow not all Python user use Pandas,and post runnable code like sample data that get read in and making a working DataFrame.

I have scrambled a solution for Series (now I have to find that Notebook), but - knowing the seeker's nature Naughty

, I was waiting (in vain Pray

?) for the minimal effort

how often does a word occur in this column?

User Panel Messages

Announcements