Python Forum

Full Version: Movie lens data analysis
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Data set screenshot

how can i find the number of movies per genre using the item data
movies that have more than one genre
drop the movie where genre is unknown
  • how can i find the number of movies per genre using the item data - Using .count() per movie dataframe column & you should be able to get number of movies per genre

  • movies that have more than one genre - Total the columns of all genere's per movie & if count>1 then it that movie has more than 1 genre

  • drop the movie where genre is unknown - select all movie's where df['unknown'] ==1


hi satya,

thank you for your reply really appriciate, i dont have any programming background so struggling a bit on this

i have used

df=pd.DataFrame({'Genre':['unknown','Action','Adventure','Animation','Childrens','Comedy','Crime','Documentary','Drama','Fantasy','Film-Noir','Horror','Musical','Mystery','Romance','Sci-Fi','Thriller','War','Western']},
index=['movie title'])

df.set_index(["unknown","Action","Adventure","Animation","Childrens","Comedy","Crime","Documentary","Drama","Fantasy","Film-Noir","Horror","Musical","Mystery","Romance","Sci-Fi","Thriller","War","Western"]).count(level="movie title")

but its not working am doing any thing wrong

those columns of genre if they are as rows corresponding to movie name probably would have been easy one hot coding changed the rows to columns i dont know how to put that columns to rows again

Oh sorry .. my bad .. didnt know it was suppose to be done like that..

corrected

df=pd.DataFrame({'Genre': ['unknown','Action','Adventure','Animation','Childrens','Comedy','Crime','Documentary','Drama','Fantasy','Film-Noir','Horror','Musical','Mystery','Romance','Sci-Fi','Thriller','War','Western']},
                  index=['movie title'])

df.set_index(["unknown","Action","Adventure","Animation","Childrens","Comedy","Crime","Documentary","Drama","Fantasy","Film-Noir","Horror","Musical","Mystery","Romance","Sci-Fi","Thriller","War","Western"]).count(level="movie title")
(Feb-19-2020, 02:36 PM)sekhar_desiraju Wrote: [ -> ]hi satya,

thank you for your reply really appriciate, i dont have any programming background so struggling a bit on this

i have used

df=pd.DataFrame({'Genre':['unknown','Action','Adventure','Animation','Childrens','Comedy','Crime','Documentary','Drama','Fantasy','Film-Noir','Horror','Musical','Mystery','Romance','Sci-Fi','Thriller','War','Western']},
index=['movie title'])

df.set_index(["unknown","Action","Adventure","Animation","Childrens","Comedy","Crime","Documentary","Drama","Fantasy","Film-Noir","Horror","Musical","Mystery","Romance","Sci-Fi","Thriller","War","Western"]).count(level="movie title")

but its not working am doing any thing wrong

those columns of genre if they are as rows corresponding to movie name probably would have been easy one hot coding changed the rows to columns i dont know how to put that columns to rows again

Oh sorry .. my bad .. didnt know it was suppose to be done like that..

corrected

df=pd.DataFrame({'Genre': ['unknown','Action','Adventure','Animation','Childrens','Comedy','Crime','Documentary','Drama','Fantasy','Film-Noir','Horror','Musical','Mystery','Romance','Sci-Fi','Thriller','War','Western']},
                  index=['movie title'])

df.set_index(["unknown","Action","Adventure","Animation","Childrens","Comedy","Crime","Documentary","Drama","Fantasy","Film-Noir","Horror","Musical","Mystery","Romance","Sci-Fi","Thriller","War","Western"]).count(level="movie title")

I tried to work with the lines of code to solve the same problem but it is not returning the right solution. Any reason why this is the case?
Hi There,

Can you please share the code here as well for the question movies that have more than one Genre. My answer is coming zero. Not sure Sad

Looking forward to hear from you!

Thanks!
SHivam