Thanks. Here is the updated code to display head:
ratings = pd.read_csv('title.ratings.tsv', sep = '\t').drop_duplicates(subset = 'tconst', keep = 'first')
titles = pd.read_csv('title.akas.tsv', sep = '\t').drop_duplicates(subset = 'titleId', keep = 'first')
print titles.head()
print ratings.head()
titles.merge(titles, ratings, left_on="titleId", right_on="tconst")
The error:
Error:
File "mihika1.py", line 8, in <module>
titles.merge(titles, ratings, left_on="titleId", right_on="tconst")
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 5370, in merge
copy=copy, indicator=indicator, validate=validate)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 57, in merge
validate=validate)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 565, in __init__
self.join_names) = self._get_merge_keys()
File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 824, in _get_merge_keys
right_keys.append(right[rk]._values)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2139, in __getitem__
return self._getitem_column(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2146, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1842, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3843, in get
loc = self.items.get_loc(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/indexes/base.py", line 2527, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'tconst'
The output from the head
Output:
sys:1: DtypeWarning: Columns (7) have mixed types. Specify dtype option on import or set low_memory=False.
titleId ordering title region language \
0 tt0000001 1 Carmencita - spanyol tánc HU \N
4 tt0000002 1 Le clown et ses chiens \N \N
10 tt0000003 1 Sarmanul Pierrot RO \N
16 tt0000004 1 Un bon bock \N \N
22 tt0000005 1 Blacksmithing Scene US \N
types attributes isOriginalTitle
0 imdbDisplay \N 0
4 original \N 1
10 imdbDisplay \N 0
16 original \N 1
22 alternative \N 0
tconst averageRating numVotes
0 tt0000001 5.8 1412
1 tt0000002 6.4 167
2 tt0000003 6.6 1006
3 tt0000004 6.4 100
4 tt0000005 6.2 1708
What a stupid mistake
titles.merge(titles, ratings, left_on="titleId", right_on="tconst")
should have been:
pd.merge(titles, ratings, left_on="titleId", right_on="tconst")