Jun-24-2019, 02:19 PM
Hello. I am trying to solve a text analytics problem.
I have two data sets (DS1 and DS2). DS1 contains a narrative text field (Description_S).
DS2 contains a narrative text field (Description_C). The two narrative fields (Description_S and Description_C) are from completely different systems and should NEVER have common content. Specifically, it has been discovered that content from Description_S is being either typed or copied-and-pasted into the Description_C field which is a major issue. So, I am trying to use Python to determine if there are common strings or an intersection between the two fields. Both narrative fields can get quite lengthy. Description_S is actually type CLOB in Teradata. Any ideas on how to solve this issue? I had looked at the set/intersection method in Python, but wanted to get advice from the Forum first.
Thank you in advance.
I have two data sets (DS1 and DS2). DS1 contains a narrative text field (Description_S).
DS2 contains a narrative text field (Description_C). The two narrative fields (Description_S and Description_C) are from completely different systems and should NEVER have common content. Specifically, it has been discovered that content from Description_S is being either typed or copied-and-pasted into the Description_C field which is a major issue. So, I am trying to use Python to determine if there are common strings or an intersection between the two fields. Both narrative fields can get quite lengthy. Description_S is actually type CLOB in Teradata. Any ideas on how to solve this issue? I had looked at the set/intersection method in Python, but wanted to get advice from the Forum first.
Thank you in advance.