Python Forum

Hello. I am attempting to compare two columns sar_details_sent_norm_trigrams_ and
caap_details_sent_norm_trigrams_ in a Pandas data frame. There are other columns as well, but these are the two I am comparing.

I'm essentially wanting to keep records where the text values for the two columns are the same.
I've executed a couple of approaches, however, I keep getting the following error message:

TypeError: unhashable type: 'set'

So, I either need to resolve why I am receiving this and fix it or try another approach, of course.
Any advice would be greatly appreciated.

Thanks.

Code snippet:

# Set with unique terms

df_sar['sar_details_sent_norm_trigrams_unique'] = df_sar['sar_details_sent_norm_trigrams_'].apply(lambda x: set([trigram for sent in x for trigram in sent]))

# Set with unique terms

df_caap['caap_details_sent_norm_trigrams_unique'] = df_caap['caap_details_sent_norm_trigrams_'].apply(lambda x: set([trigram for sent in x for trigram in sent]))



#Attempt 1: 

df_caap[df_caap.caap_details_sent_norm_trigrams_unique.isin(df_sar.sar_details_sent_norm_trigrams_unique)]


#Attempt 2:

set(df_caap.caap_details_sent_norm_trigrams_unique).intersection(set(df_sar.sar_details_sent_norm_trigrams_unique))

TypeError Traceback (most recent call last)
<ipython-input-171-2c2bb5551c7e> in <module>()
21 #set(df1.columns).intersection(set(df2.columns))
22
---> 23 set(df_caap.caap_details_sent_norm_trigrams_unique).intersection(set(df_sar.sar_details_sent_norm_trigrams_unique))

TypeError: unhashable type: 'set'

You can't put a set in a set because sets can only contain immutable (hashable) types. You can convert your set to a tuple or a frozenset to make it immutable and qualify for being put into a set.

Ahhh, thank you. Much appreciated!

Quick and dirty solution with less Python knowledge:

hashable_data = tuple(set(ITERABLE))

Mutable objects don't have a hash, because they can mutate.
Immutable objects doesn't change, so they have have a hash.

There is also a built-in type, called frozenset and yes, it does what it sounds like.
This is an immutable set, which has an hash.

You can make the test:

# will fail
{set(): 42}

# is ok
{frozenset(): 42}

Try this:

df_sar['sar_details_sent_norm_trigrams_unique'] = df_sar['sar_details_sent_norm_trigrams_'].apply(lambda x: frozenset([trigram for sent in x for trigram in sent]))

And you can remove the square brackets, then it's a generator expression, which is consumed by frozenset (saves memory).
Otherwise first a list from the set is created in memory, then it's applied to the dataframe.

df_sar['sar_details_sent_norm_trigrams_unique'] = df_sar['sar_details_sent_norm_trigrams_'].apply(lambda x: frozenset(trigram for sent in x for trigram in sent))

(May-08-2019, 05:32 PM)DeaD_EyE Wrote: [ -> ]Quick and dirty solution with less Python knowledge:
hashable_data = tuple(set(ITERABLE))
Mutable objects don't have a hash, because they can mutate.
Immutable objects doesn't change, so they have have a hash.

There is also a built-in type, called frozenset and yes, it does what it sounds like.
This is an immutable set, which has an hash.

You can make the test:
# will fail
{set(): 42}

# is ok
{frozenset(): 42}
Try this:
df_sar['sar_details_sent_norm_trigrams_unique'] = df_sar['sar_details_sent_norm_trigrams_'].apply(lambda x: frozenset([trigram for sent in x for trigram in sent]))
And you can remove the square brackets, then it's a generator expression, which is consumed by frozenset (saves memory).
Otherwise first a list from the set is created in memory, then it's applied to the dataframe.
df_sar['sar_details_sent_norm_trigrams_unique'] = df_sar['sar_details_sent_norm_trigrams_'].apply(lambda x: frozenset(trigram for sent in x for trigram in sent))

This is great - thank you.

twinpiques

micseydel

twinpiques

DeaD_EyE

twinpiques