Python Forum
Is there a Python text mining script to classify text with multiple classifications?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Is there a Python text mining script to classify text with multiple classifications?
#1
Classification of descriptions into categories

I have a problem that involves determining what category a text description falls under. These text descriptions are entered in by users and may contain keywords that can be matched to a specific category. Each category has a set of keywords and phrases that can be matched to. There are about 100 categories. For example, a text description might look like this, “Burlap aisle runner w/borders”, and the category “Fabric” contains the keyword “Burlap”, so that the text description could fall under the category.

text description/category

Orange Burlap aisle runner w/borders/Fabric

However, there are a couple of exceptions that make this categorization process more difficult.

First, there are text descriptions that contain keywords that match to multiple categories. For example, a text description could fall under 20 different categories (out of 100) due to having keywords that are the same in the categories. This does not permit the correct categorization of the text description.

For example, a text description that is “Orange Burlap aisle runner w/borders”, would have a keyword “Orange” that falls under the category “Fruit”, while also falling under “Fabric” due to the keyword “Burlap”.

text description/category

Orange Burlap aisle runner w/borders/Fabric, Fruit

Second, there are keywords in the text description that do not match directly to any of the categories. Again, this does not permit the correct categorization of the text description.

For example, a text description that contains the keyword “mouse” does not match directly with the category “Computer Accessory”.

Can anyone suggest an algorithm or python library that can classify text descriptions without direct classification and eliminate multi-classification?

I have broken down the keywords for both the text descriptions and categories, and then matched them.

This was the code I used to match the text description with the categories.

[inline]%LivyPy3.pyspark

entries['category']=list(map(lambda i:list(map(categories_list.get,i)),entries['text_description']))[/inline]

However, from this script there are either multiple categorization or no categorization at all.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Cleaning a dataset: How to extract text between two patterns Palke 0 1,139 Mar-06-2023, 05:13 PM
Last Post: Palke
  Libraries for Process Mining PythonBeginner1 1 20,147 Jan-21-2023, 11:56 AM
Last Post: Larz60+
  Extracting tables and text above the table from a PDF to CSV DivAsh 3 2,390 Jan-18-2023, 07:39 AM
Last Post: perfringo
  Make unique id in vectorized way based on text data column with similarity scoring ill8 0 861 Dec-12-2022, 03:22 AM
Last Post: ill8
  Reading large crapy text file in anaconda to profile data syamatunuguntla 0 811 Nov-18-2022, 06:15 PM
Last Post: syamatunuguntla
  Extracting Text standenman 5 2,205 Nov-01-2021, 10:49 PM
Last Post: Gribouillis
  How to recognize negative in a text? AlekseyPython 1 1,812 Oct-06-2021, 10:09 AM
Last Post: Larz60+
  Checking for a recognized text in a Dataframe KDE 0 1,519 Aug-31-2021, 11:19 PM
Last Post: KDE
  Most Compatible Text Editor to Handle Large Files? Robotguy 2 2,336 Aug-18-2020, 03:51 PM
Last Post: FortyTwo
  Filter rows by multiple text conditions in another data frame i.e contains strings an Pan 0 2,131 Jun-09-2020, 06:05 AM
Last Post: Pan

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020