Python Forum
Is there a Python text mining script to classify text with multiple classifications?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Is there a Python text mining script to classify text with multiple classifications?
#1
Classification of descriptions into categories

I have a problem that involves determining what category a text description falls under. These text descriptions are entered in by users and may contain keywords that can be matched to a specific category. Each category has a set of keywords and phrases that can be matched to. There are about 100 categories. For example, a text description might look like this, “Burlap aisle runner w/borders”, and the category “Fabric” contains the keyword “Burlap”, so that the text description could fall under the category.

text description/category

Orange Burlap aisle runner w/borders/Fabric

However, there are a couple of exceptions that make this categorization process more difficult.

First, there are text descriptions that contain keywords that match to multiple categories. For example, a text description could fall under 20 different categories (out of 100) due to having keywords that are the same in the categories. This does not permit the correct categorization of the text description.

For example, a text description that is “Orange Burlap aisle runner w/borders”, would have a keyword “Orange” that falls under the category “Fruit”, while also falling under “Fabric” due to the keyword “Burlap”.

text description/category

Orange Burlap aisle runner w/borders/Fabric, Fruit

Second, there are keywords in the text description that do not match directly to any of the categories. Again, this does not permit the correct categorization of the text description.

For example, a text description that contains the keyword “mouse” does not match directly with the category “Computer Accessory”.

Can anyone suggest an algorithm or python library that can classify text descriptions without direct classification and eliminate multi-classification?

I have broken down the keywords for both the text descriptions and categories, and then matched them.

This was the code I used to match the text description with the categories.

[inline]%LivyPy3.pyspark

entries['category']=list(map(lambda i:list(map(categories_list.get,i)),entries['text_description']))[/inline]

However, from this script there are either multiple categorization or no categorization at all.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Process Mining: OCEL creation from CSV with PM4PY kldgee 0 116 Apr-08-2024, 10:29 PM
Last Post: kldgee
  Cleaning a dataset: How to extract text between two patterns Palke 0 1,155 Mar-06-2023, 05:13 PM
Last Post: Palke
  Libraries for Process Mining PythonBeginner1 1 22,353 Jan-21-2023, 11:56 AM
Last Post: Larz60+
  Extracting tables and text above the table from a PDF to CSV DivAsh 3 2,468 Jan-18-2023, 07:39 AM
Last Post: perfringo
  Make unique id in vectorized way based on text data column with similarity scoring ill8 0 883 Dec-12-2022, 03:22 AM
Last Post: ill8
  Reading large crapy text file in anaconda to profile data syamatunuguntla 0 829 Nov-18-2022, 06:15 PM
Last Post: syamatunuguntla
  Extracting Text standenman 5 2,264 Nov-01-2021, 10:49 PM
Last Post: Gribouillis
  How to recognize negative in a text? AlekseyPython 1 1,835 Oct-06-2021, 10:09 AM
Last Post: Larz60+
  Checking for a recognized text in a Dataframe KDE 0 1,538 Aug-31-2021, 11:19 PM
Last Post: KDE
  Most Compatible Text Editor to Handle Large Files? Robotguy 2 2,371 Aug-18-2020, 03:51 PM
Last Post: FortyTwo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020