Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Why can't I merge pandas dataframes
#1
I'm just trying to learn python and started with the imdb database files. (the headers and the data files can be seen here: https://www.imdb.com/interfaces/)

When I try to merge the two different data frames. I keep getting the error key not found:

ratings = pd.read_csv('title.ratings.tsv', sep = '\t').drop_duplicates(subset = 'tconst', keep = 'first')
titles = pd.read_csv('title.akas.tsv', sep = '\t').drop_duplicates(subset = 'titleId', keep = 'first')
titles.merge(titles, ratings, left_on="titleId", right_on="tconst")
I can't figure out what I'm doing wrong. Any guidance would be appreciated.
Quote
#2
I don't see anything obviously wrong. The exact error you are getting would be helpful. I would also print the two datasets after you pull them but before you try the merge to make sure they are what you expect. Are you sure the error is on the merge, and not on one of the drop_duplicates?
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures

Quote
#3
Thanks. Here is the updated code to display head:

ratings = pd.read_csv('title.ratings.tsv', sep = '\t').drop_duplicates(subset = 'tconst', keep = 'first')
titles = pd.read_csv('title.akas.tsv', sep = '\t').drop_duplicates(subset = 'titleId', keep = 'first')
print titles.head()
print ratings.head()
titles.merge(titles, ratings, left_on="titleId", right_on="tconst")
The error:
Error:
File "mihika1.py", line 8, in <module> titles.merge(titles, ratings, left_on="titleId", right_on="tconst") File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 5370, in merge copy=copy, indicator=indicator, validate=validate) File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 57, in merge validate=validate) File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 565, in __init__ self.join_names) = self._get_merge_keys() File "/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.py", line 824, in _get_merge_keys right_keys.append(right[rk]._values) File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2139, in __getitem__ return self._getitem_column(key) File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2146, in _getitem_column return self._get_item_cache(key) File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1842, in _get_item_cache values = self._data.get(item) File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3843, in get loc = self.items.get_loc(item) File "/usr/local/lib/python2.7/dist-packages/pandas/core/indexes/base.py", line 2527, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'tconst'
The output from the head
Output:
sys:1: DtypeWarning: Columns (7) have mixed types. Specify dtype option on import or set low_memory=False. titleId ordering title region language \ 0 tt0000001 1 Carmencita - spanyol tánc HU \N 4 tt0000002 1 Le clown et ses chiens \N \N 10 tt0000003 1 Sarmanul Pierrot RO \N 16 tt0000004 1 Un bon bock \N \N 22 tt0000005 1 Blacksmithing Scene US \N types attributes isOriginalTitle 0 imdbDisplay \N 0 4 original \N 1 10 imdbDisplay \N 0 16 original \N 1 22 alternative \N 0 tconst averageRating numVotes 0 tt0000001 5.8 1412 1 tt0000002 6.4 167 2 tt0000003 6.6 1006 3 tt0000004 6.4 100 4 tt0000005 6.2 1708

What a stupid mistake

titles.merge(titles, ratings, left_on="titleId", right_on="tconst")
should have been:

pd.merge(titles, ratings, left_on="titleId", right_on="tconst")
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Giving index when joining dataframes kw42chan 1 151 Jul-06-2019, 06:19 AM
Last Post: kw42chan
  Could anyone help me get the jaccard distance between my dataframes please? :) a_real_phoenix 0 226 Jun-27-2019, 06:01 PM
Last Post: a_real_phoenix
  Two dataframes merged Ecniv 10 478 Jun-16-2019, 09:10 PM
Last Post: Ecniv
  Statistical analysis of two dataframes zhl 1 418 Jun-11-2019, 07:26 PM
Last Post: Ecniv
  Interpolate using multiple dataframes Lastwizzle 0 182 May-29-2019, 05:32 PM
Last Post: Lastwizzle
  Sum product multiple Dataframes based on column headers. Lastwizzle 0 204 May-21-2019, 04:05 PM
Last Post: Lastwizzle
  Merge JSON files prioritizing the updated values from most recent file nebulae 0 215 Apr-17-2019, 10:15 AM
Last Post: nebulae
  Merge Predictions with whole data set mayanksrivastava 0 1,249 Jun-29-2017, 11:39 AM
Last Post: mayanksrivastava
  Pandas merge question smw10c 1 1,844 May-05-2017, 09:57 PM
Last Post: zivoni
  pandas/index.pyx in pandas.index.IndexEngine.get_loc - KeyError jacobs.smith 2 3,304 Nov-17-2016, 04:07 PM
Last Post: nilamo

Forum Jump:


Users browsing this thread: 1 Guest(s)