Python Forum
Error Message: TypeError: unhashable type: 'set'
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Error Message: TypeError: unhashable type: 'set'
#1
Hello. I am attempting to compare two columns sar_details_sent_norm_trigrams_ and
caap_details_sent_norm_trigrams_ in a Pandas data frame. There are other columns as well, but these are the two I am comparing.

I'm essentially wanting to keep records where the text values for the two columns are the same.
I've executed a couple of approaches, however, I keep getting the following error message:

TypeError: unhashable type: 'set'

So, I either need to resolve why I am receiving this and fix it or try another approach, of course.
Any advice would be greatly appreciated.

Thanks.

Code snippet:

# Set with unique terms

df_sar['sar_details_sent_norm_trigrams_unique'] = df_sar['sar_details_sent_norm_trigrams_'].apply(lambda x: set([trigram for sent in x for trigram in sent]))

# Set with unique terms

df_caap['caap_details_sent_norm_trigrams_unique'] = df_caap['caap_details_sent_norm_trigrams_'].apply(lambda x: set([trigram for sent in x for trigram in sent]))



#Attempt 1: 

df_caap[df_caap.caap_details_sent_norm_trigrams_unique.isin(df_sar.sar_details_sent_norm_trigrams_unique)]


#Attempt 2:

set(df_caap.caap_details_sent_norm_trigrams_unique).intersection(set(df_sar.sar_details_sent_norm_trigrams_unique))
TypeError Traceback (most recent call last)
<ipython-input-171-2c2bb5551c7e> in <module>()
21 #set(df1.columns).intersection(set(df2.columns))
22
---> 23 set(df_caap.caap_details_sent_norm_trigrams_unique).intersection(set(df_sar.sar_details_sent_norm_trigrams_unique))

TypeError: unhashable type: 'set'
Reply
#2
You can't put a set in a set because sets can only contain immutable (hashable) types. You can convert your set to a tuple or a frozenset to make it immutable and qualify for being put into a set.
Reply
#3
Ahhh, thank you. Much appreciated!
Reply
#4
Quick and dirty solution with less Python knowledge:
hashable_data = tuple(set(ITERABLE))
Mutable objects don't have a hash, because they can mutate.
Immutable objects doesn't change, so they have have a hash.

There is also a built-in type, called frozenset and yes, it does what it sounds like.
This is an immutable set, which has an hash.

You can make the test:

# will fail
{set(): 42}

# is ok
{frozenset(): 42}

Try this:

df_sar['sar_details_sent_norm_trigrams_unique'] = df_sar['sar_details_sent_norm_trigrams_'].apply(lambda x: frozenset([trigram for sent in x for trigram in sent]))
And you can remove the square brackets, then it's a generator expression, which is consumed by frozenset (saves memory).
Otherwise first a list from the set is created in memory, then it's applied to the dataframe.

df_sar['sar_details_sent_norm_trigrams_unique'] = df_sar['sar_details_sent_norm_trigrams_'].apply(lambda x: frozenset(trigram for sent in x for trigram in sent))
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#5
(May-08-2019, 05:32 PM)DeaD_EyE Wrote: Quick and dirty solution with less Python knowledge:
hashable_data = tuple(set(ITERABLE))
Mutable objects don't have a hash, because they can mutate.
Immutable objects doesn't change, so they have have a hash.

There is also a built-in type, called frozenset and yes, it does what it sounds like.
This is an immutable set, which has an hash.

You can make the test:

# will fail
{set(): 42}

# is ok
{frozenset(): 42}

Try this:

df_sar['sar_details_sent_norm_trigrams_unique'] = df_sar['sar_details_sent_norm_trigrams_'].apply(lambda x: frozenset([trigram for sent in x for trigram in sent]))
And you can remove the square brackets, then it's a generator expression, which is consumed by frozenset (saves memory).
Otherwise first a list from the set is created in memory, then it's applied to the dataframe.

df_sar['sar_details_sent_norm_trigrams_unique'] = df_sar['sar_details_sent_norm_trigrams_'].apply(lambda x: frozenset(trigram for sent in x for trigram in sent))

This is great - thank you.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  TypeError: unhashable type: 'Series' bongielondympofu 2 290 Mar-14-2024, 06:12 PM
Last Post: deanhystad
  [pandas] TypeError: list indices must be integers or slices, not str but type is int. cspower 4 760 Dec-30-2023, 09:38 AM
Last Post: Gribouillis
  GroupBy - Sum = Error [datetime64 type does not support sum operations] BSDevo 4 2,540 Oct-27-2023, 07:22 PM
Last Post: BSDevo
  Numpy returns "TypeError: unsupported operand type(s) for *: 'numpy.ufunc' and 'int'" kalle 2 2,527 Jul-19-2022, 06:31 AM
Last Post: paul18fr
  Type error in Cross_val_score Vicral 0 1,806 Jul-20-2021, 12:18 PM
Last Post: Vicral
  type error array BrianPA 2 2,349 Jan-17-2021, 01:48 PM
Last Post: BrianPA
  Get error message in a GAN neural network tutorial jdude50 0 1,648 Oct-22-2020, 11:11 PM
Last Post: jdude50
  Error binding parameter 0 - probably unsupported type. illmattic 7 10,222 Jul-18-2020, 09:32 PM
Last Post: illmattic
  Series object error message abhaydd 1 4,824 Aug-11-2019, 01:29 AM
Last Post: boring_accountant
  Error Message - Akainu 2 3,248 May-24-2019, 09:09 PM
Last Post: Akainu

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020