Python Forum

Full Version: Is there a Python text mining script to classify text with multiple classifications?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Classification of descriptions into categories

I have a problem that involves determining what category a text description falls under. These text descriptions are entered in by users and may contain keywords that can be matched to a specific category. Each category has a set of keywords and phrases that can be matched to. There are about 100 categories. For example, a text description might look like this, “Burlap aisle runner w/borders”, and the category “Fabric” contains the keyword “Burlap”, so that the text description could fall under the category.

text description/category

Orange Burlap aisle runner w/borders/Fabric

However, there are a couple of exceptions that make this categorization process more difficult.

First, there are text descriptions that contain keywords that match to multiple categories. For example, a text description could fall under 20 different categories (out of 100) due to having keywords that are the same in the categories. This does not permit the correct categorization of the text description.

For example, a text description that is “Orange Burlap aisle runner w/borders”, would have a keyword “Orange” that falls under the category “Fruit”, while also falling under “Fabric” due to the keyword “Burlap”.

text description/category

Orange Burlap aisle runner w/borders/Fabric, Fruit

Second, there are keywords in the text description that do not match directly to any of the categories. Again, this does not permit the correct categorization of the text description.

For example, a text description that contains the keyword “mouse” does not match directly with the category “Computer Accessory”.

Can anyone suggest an algorithm or python library that can classify text descriptions without direct classification and eliminate multi-classification?

I have broken down the keywords for both the text descriptions and categories, and then matched them.

This was the code I used to match the text description with the categories.

[inline]%LivyPy3.pyspark

entries['category']=list(map(lambda i:list(map(categories_list.get,i)),entries['text_description']))[/inline]

However, from this script there are either multiple categorization or no categorization at all.