Python Forum

Full Version: Converting text data into numeric to use SVM
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,
I am very new to Python, so I apologize if this is silly question and answer is very easy. I have recently assigned a task to get the text input and apply machine learning so it can perform classification(true or not true). I have 3 text field and base on that a decision has to be made. I want to use SVM. but unfortunately I dont know enough python to convert my text data into int. If someone can guide me into right direction (tutorial, example) to convert the text data to int that will be greatly appreciated. I learn most of the things by taking courses on plural sight. I have not figure out which course would guide me through the conversion, hence my question.
Thanks in advance for your help.
(Jan-17-2019, 06:33 PM)mukhan169 Wrote: [ -> ]I dont know enough python to convert my text data into int. If someone can guide me into right direction (tutorial, example) to convert the text data to int that will be greatly appreciated.

In Python you convert string to int with built-in function int().

Some examples:

>>> int('10')
10
>>> int('0010')
10
>>> int('1.0')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1.0'
May be I wasnt clear in my question. I have some data in string like "clutch repair", "Customer complaint noise replace the ball joint" they (Analysts) do the analysis on that to determine if that was correct or not. If it was correct then they will set a certain flag to True and if it wasnt then false. I have to run the data in machine learning and I have to convert "clutch repair" and "Customer complaint noise replace the ball joint" into int so I can run supervised machine learning on it as it only takes integer as parameters.
(Jan-18-2019, 01:28 PM)mukhan169 Wrote: [ -> ]May be I wasnt clear in my question. I have some data in string like "clutch repair", "Customer complaint noise replace the ball joint" they (Analysts) do the analysis on that to determine if that was correct or not. If it was correct then they will set a certain flag to True and if it wasnt then false. I have to run the data in machine learning and I have to convert "clutch repair" and "Customer complaint noise replace the ball joint" into int so I can run supervised machine learning on it as it only takes integer as parameters.


It doesn't make it any clearer. To you want replace specific text like "Customer complaint noise replace the ball joint" with some specific int like 42? And "clutch repair" with 43 or something?
The solution is to use a dictionary
table = {
    "Customer complaint noise replace the ball joint": 42,
    "clutch repair": 43,
}

data = 'clutch repair'
print(table[data]) # prints 43
(Jan-18-2019, 07:18 AM)perfringo Wrote: [ -> ]
(Jan-17-2019, 06:33 PM)mukhan169 Wrote: [ -> ]I dont know enough python to convert my text data into int. If someone can guide me into right direction (tutorial, example) to convert the text data to int that will be greatly appreciated.

In Python you convert string to int with built-in function int().

Some examples:

>>> int('10')
10
>>> int('0010')
10
>>> int('1.0')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1.0'
Yes I am but i want to use tf and itf. I cant hard code a number to a string as their will be over 100 k records. once i have converted string to tf and itf I should be able to use the SVM algorithm.
Does it make any more sense?
I guess something like this but alot simpler. Again I am very new to this
https://www.kaggle.com/sudhirnl7/simple-...es-xgboost