Bottom Page

Thread Rating:
  • 1 Vote(s) - 2 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 how often does a word occur in this column?
#1
Hello there,
I have data like this
Genres
Action|Adventure
Action|Adventure|Animation
Action|Adventure|Animation|Comedy|Drama
Action|Adventure|Animation|Comedy|Family
Action|Adventure|Animation|Drama|Family
Action|Adventure|Animation|Family
Action|Adventure|Animation|Family|Fantasy
Action|Adventure|Animation|Family|Mystery
Action|Adventure|Animation|Family|Science Fiction
Action|Adventure|Animation|Fantasy
Action|Adventure|Animation|Fantasy|Horror
Action|Adventure|Animation|Fantasy|Science Fiction


I want to know what genre is the most popular? (= how often does a word occur in this column?)
How can I do it with python 3?

Thank you!
J
Quote
#2
Use collections:
from collections import Counter


data = 'Genres\nAction|Adventure\nAction|Adventure|Animation\nAction|Adventure|Animation|Comedy|Drama\n' \
    'Action|Adventure|Animation|Comedy|Family\nAction|Adventure|Animation|Drama|Family\n' \
    'Action|Adventure|Animation|Family\nAction|Adventure|Animation|Family|Fantasy\n' \
    'Action|Adventure|Animation|Family|Mystery\nAction|Adventure|Animation|Family|Science Fiction\n' \
    'Action|Adventure|Animation|Fantasy\nAction|Adventure|Animation|Fantasy|Horror\n' \
    'Action|Adventure|Animation|Fantasy|Science Fiction\n'

data_list = data.strip().split('|')
print(Counter(data_list).most_common(1))
Results:
Output:
[('Adventure', 11)]
ljmetzger likes this post
Quote
#3
Great answer @Lars60+. I thnk the newlines should be replaced by vertical bars before parsing (see 'Adventure\n' in line one).
from collections import Counter

data = 'Genres\nAction|Adventure\nAction|Adventure|Animation\nAction|Adventure|Animation|Comedy|Drama\n' \
    'Action|Adventure|Animation|Comedy|Family\nAction|Adventure|Animation|Drama|Family\n' \
    'Action|Adventure|Animation|Family\nAction|Adventure|Animation|Family|Fantasy\n' \
    'Action|Adventure|Animation|Family|Mystery\nAction|Adventure|Animation|Family|Science Fiction\n' \
    'Action|Adventure|Animation|Fantasy\nAction|Adventure|Animation|Fantasy|Horror\n' \
    'Action|Adventure|Animation|Fantasy|Science Fiction\n'
    
data_list = data.replace('\n', '|')
data_list = data_list.strip().split('|')
print(Counter(data_list).most_common(1))
Output:
[('Action', 12)]
Lewis
Jack_Sparrow likes this post
To paraphrase: 'Throw out your dead' code. https://www.youtube.com/watch?v=grbSQ6O6kbs Forward to 1:00
Quote
#4
(Jun-10-2018, 03:19 PM)Larz60+ Wrote: Use collections:
data = 'Genres\nAction|Adventure\nAction|Adventure|Animation\nAction|Adventure|Animation|Comedy|Drama\n' \
    'Action|Adventure|Animation|Comedy|Family\nAction|Adventure|Animation|Drama|Family\n' \
    'Action|Adventure|Animation|Family\nAction|Adventure|Animation|Family|Fantasy\n' \
    'Action|Adventure|Animation|Family|Mystery\nAction|Adventure|Animation|Family|Science Fiction\n' \
    'Action|Adventure|Animation|Fantasy\nAction|Adventure|Animation|Fantasy|Horror\n' \
    'Action|Adventure|Animation|Fantasy|Science Fiction\n'

data_list = data.strip().split('|')
print(Counter(data_list).most_common(1))
Results:
Output:
[('Adventure', 11)]

With \n in the middle, the Counter will produce a wrong result

data_list = data.strip().replace('\n', '|').split('|')
print(Counter(data_list).most_common(1))
The right result -
Output:
[('Action', 12)]
But I have a nagging suspicion that the column in OP is in DataFrame

Ooops, missed the post above
Jack_Sparrow likes this post
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Quote
#5
yup, they need to be removed
Jack_Sparrow likes this post
Quote
#6
(Jun-10-2018, 07:38 PM)volcano63 Wrote: But I have a nagging suspicion that the column in OP is in DataFrame
Yes,i guess this part is left this out and just assume that all should know that this is for Pandas.
@Jack_Sparrow not all Python user use Pandas,and post runnable code like sample data that get read in and making a working DataFrame.
Jack_Sparrow likes this post
Quote
#7
(Jun-10-2018, 08:39 PM)snippsat Wrote:
(Jun-10-2018, 07:38 PM)volcano63 Wrote: But I have a nagging suspicion that the column in OP is in DataFrame
Yes,i guess this part is left this out and just assume that all should know that this is for Pandas.
@Jack_Sparrow not all Python user use Pandas,and post runnable code like sample data that get read in and making a working DataFrame.

I have scrambled a solution for Series (now I have to find that Notebook), but - knowing the seeker's nature Naughty, I was waiting (in vain Pray ?) for the minimal effort
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Pandas - Dynamic column aggregation based on another column theroadbacktonature 0 149 Apr-17-2020, 04:54 PM
Last Post: theroadbacktonature
  How to delete column if entire column values are "nan" Sri 4 1,005 Apr-13-2019, 12:16 PM
Last Post: Sri

Forum Jump:


Users browsing this thread: 1 Guest(s)