New to Pandas. I need help fixing a TypeError - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: New to Pandas. I need help fixing a TypeError (/thread-31726.html) |
New to Pandas. I need help fixing a TypeError - kramon19 - Dec-30-2020 I loaded a CSV file called 'IMDb movies.csv' from kaggle. Here is the link to this file https://www.kaggle.com/stefanoleone992/imdb-extensive-dataset. I am just trying to reinforce what I learned about DataFrames and some basic commands. Down below I pasted my code. The problem is that I am getting a TypeError "TypeError: '>=' not supported between instances of 'str' and 'int'". I do not know why. It is coming from this line year_before_1970 = movies_df[movies_df['year'] >= 1970]. However, the line below, I do something similar like such print(movies_df[movies_df['avg_vote'] >= 8.6].head(3))and it doesn't give me an error. Can someone please help me figure out what went wrong. import pandas as pd movies_df = pd.read_csv('IMDb movies.csv') #prints the first 5 rows of Dataset print('***********************************\nFirst 5 rows\n',movies_df.head(), '\n***********************************') #selects the columns headers of the Dataset col = movies_df.columns print('Headers of Dataset\n', col, '\n***********************************') #which year produced the most movies most_movies_yearly = movies_df.groupby('year').imdb_title_id.count().reset_index() print(most_movies_yearly) # movies were produced before the year 1970 year_before_1970 = movies_df[movies_df['year'] >= 1970] print(year_before_1970) RE: New to Pandas. I need help fixing a TypeError - snippsat - Dec-30-2020 movies_df['year'] is now a string object.Use movies_df.dtypes to check what types DataFrame has.Then clean up so get a int64 for year.Example: import pandas as pd from io import StringIO data = StringIO('''\ Movie,Year Seven,1995 The Godfather,1972 end Jaws,1975 Lawrence of Arabia,1962''') df = pd.read_csv(data, sep=',') print(df) print(df.dtypes) Clean up and fix type.df['Year'] = df['Year'].str.extract('(\d+)') df['Year'] = pd.to_numeric(df["Year"]) print(df) print(df.dtypes) Now can find movies before 1970.print(df[df['Year'] <= 1970])
|