Sep-08-2021, 02:12 AM
I am completely new to RandomForest and Machine Learning. Some help will be appreciated! Thank you!
Example of DataSet
Is this correct? Can I add the sparse matrix of bow into a df?
Example of DataSet
**ID |sentiment | review | source |** '5' |0 | lousy movie | twitter | '6' |1 | excellent acting | website | '7' |0 | bad script, but wonderful actors | feedback |I create Bag-of-word (BOW) for review
import pandas as pd from sklearn.feature_extraction.text import CountVectorizer from sklearn.ensemble import RandomForestClassifier file_location = 'C:/Desktop/test.xlsx' xlsx=pd.ExcelFile(file_location, engine='openpyxl') df=xlsx.parse('Sheet1',header=0) bow=df['review'] Y_train=df['sentiment'] vect = CountVectorizer() bow = vect.fit_transform(bow)I created another df and added both BOW and Review as columns
Is this correct? Can I add the sparse matrix of bow into a df?
df1 = pd.DataFrame(bow) df1['source']=df['source'] X_train=df1.values print(X_train)ouput of print(X_train)
[[<1x16 sparse matrix of type '<class 'numpy.int64'>' with 6 stored elements in Compressed Sparse Row format> 'twitter'] [<1x16 sparse matrix of type '<class 'numpy.int64'>' with 5 stored elements in Compressed Sparse Row format> 'website'] [<1x16 sparse matrix of type '<class 'numpy.int64'>' with 2 stored elements in Compressed Sparse Row format> 'feedback']Train the RandomForest Model
forest = RandomForestClassifier(n_estimators = 100) forest = forest.fit( X_train, Y_train)Error
ValueError: setting an array element with a sequence