Mar-23-2017, 06:26 PM
from sklearn.feature_extraction.text import CountVectorizer sentences = ['dont iterate over rows of dataframe', 'try to use dataframe indexing'] vec = CountVectorizer() vectors = vec.fit_transform(sentences).toarray() print(sorted(((v, k) for k,v in vec.vocabulary_.items()))) print(vectors[0]) print(vectors[1])
Output:[(0, 'dataframe'), (1, 'dont'), (2, 'indexing'), (3, 'iterate'), (4, 'of'), (5, 'over'), (6, 'rows'), (7, 'to'), (8, 'try'), (9, 'use')]
[1 1 0 1 1 1 1 0 0 0]
[1 0 1 0 0 0 0 1 1 1]