how often does a word occur in this column? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: how often does a word occur in this column? (/thread-10857.html) |
how often does a word occur in this column? - Jack_Sparrow - Jun-10-2018 Hello there, I have data like this Genres Action|Adventure Action|Adventure|Animation Action|Adventure|Animation|Comedy|Drama Action|Adventure|Animation|Comedy|Family Action|Adventure|Animation|Drama|Family Action|Adventure|Animation|Family Action|Adventure|Animation|Family|Fantasy Action|Adventure|Animation|Family|Mystery Action|Adventure|Animation|Family|Science Fiction Action|Adventure|Animation|Fantasy Action|Adventure|Animation|Fantasy|Horror Action|Adventure|Animation|Fantasy|Science Fiction I want to know what genre is the most popular? (= how often does a word occur in this column?) How can I do it with python 3? Thank you! J RE: how often does a word occur in this column? - Larz60+ - Jun-10-2018 Use collections: from collections import Counter data = 'Genres\nAction|Adventure\nAction|Adventure|Animation\nAction|Adventure|Animation|Comedy|Drama\n' \ 'Action|Adventure|Animation|Comedy|Family\nAction|Adventure|Animation|Drama|Family\n' \ 'Action|Adventure|Animation|Family\nAction|Adventure|Animation|Family|Fantasy\n' \ 'Action|Adventure|Animation|Family|Mystery\nAction|Adventure|Animation|Family|Science Fiction\n' \ 'Action|Adventure|Animation|Fantasy\nAction|Adventure|Animation|Fantasy|Horror\n' \ 'Action|Adventure|Animation|Fantasy|Science Fiction\n' data_list = data.strip().split('|') print(Counter(data_list).most_common(1))Results:
RE: how often does a word occur in this column? - ljmetzger - Jun-10-2018 Great answer @Lars60+. I thnk the newlines should be replaced by vertical bars before parsing (see 'Adventure\n' in line one). from collections import Counter data = 'Genres\nAction|Adventure\nAction|Adventure|Animation\nAction|Adventure|Animation|Comedy|Drama\n' \ 'Action|Adventure|Animation|Comedy|Family\nAction|Adventure|Animation|Drama|Family\n' \ 'Action|Adventure|Animation|Family\nAction|Adventure|Animation|Family|Fantasy\n' \ 'Action|Adventure|Animation|Family|Mystery\nAction|Adventure|Animation|Family|Science Fiction\n' \ 'Action|Adventure|Animation|Fantasy\nAction|Adventure|Animation|Fantasy|Horror\n' \ 'Action|Adventure|Animation|Fantasy|Science Fiction\n' data_list = data.replace('\n', '|') data_list = data_list.strip().split('|') print(Counter(data_list).most_common(1)) Lewis
RE: how often does a word occur in this column? - volcano63 - Jun-10-2018 (Jun-10-2018, 03:19 PM)Larz60+ Wrote: Use collections: With \n in the middle, the Counter will produce a wrong resultdata_list = data.strip().replace('\n', '|').split('|') print(Counter(data_list).most_common(1))The right result - But I have a nagging suspicion that the column in OP is in DataFrame Ooops, missed the post above RE: how often does a word occur in this column? - Larz60+ - Jun-10-2018 yup, they need to be removed RE: how often does a word occur in this column? - snippsat - Jun-10-2018 (Jun-10-2018, 07:38 PM)volcano63 Wrote: But I have a nagging suspicion that the column in OP is in DataFrameYes,i guess this part is left this out and just assume that all should know that this is for Pandas. @Jack_Sparrow not all Python user use Pandas,and post runnable code like sample data that get read in and making a working DataFrame. RE: how often does a word occur in this column? - volcano63 - Jun-10-2018 (Jun-10-2018, 08:39 PM)snippsat Wrote:(Jun-10-2018, 07:38 PM)volcano63 Wrote: But I have a nagging suspicion that the column in OP is in DataFrameYes,i guess this part is left this out and just assume that all should know that this is for Pandas. I have scrambled a solution for Series (now I have to find that Notebook), but - knowing the seeker's nature , I was waiting (in vain ?) for the minimal effort
|