Hi
For the below test text ,
test=['test test', 'test toy'],
the tf-idf score is
[['test', 1.0]]
[['test', 0.35], ['toy', 0.94]]
I am not sure how it is arrived at. Can someone show an example?
When I change to NO normalisation (smartirs='ntn'), I get
[['test', 1.17]]
[['test', 0.58], ['toy', 1.58]]
this doesn't seem to tally with what I get via direct computation of
tfidf w, d) = tf x idf
Eg
doc 1: for "test" word
tf= 1
idf= log(2/2) = 0
tf-idf = 0
For the below test text ,
test=['test test', 'test toy'],
the tf-idf score is
[['test', 1.0]]
[['test', 0.35], ['toy', 0.94]]
I am not sure how it is arrived at. Can someone show an example?
When I change to NO normalisation (smartirs='ntn'), I get
[['test', 1.17]]
[['test', 0.58], ['toy', 1.58]]
this doesn't seem to tally with what I get via direct computation of
tfidf w, d) = tf x idf
Eg
doc 1: for "test" word
tf= 1
idf= log(2/2) = 0
tf-idf = 0
test=['test test', 'test toy'] texts = [simple_preprocess(doc) for doc in test] mydict= corpora.Dictionary(texts) mycorpus = [mydict.doc2bow(doc, allow_update=True) for doc in texts] tfidf = models.TfidfModel(mycorpus, smartirs='ntc') for doc in tfidf[mycorpus]: print([[mydict[id], np.around(freq, decimals=2)] for id, freq in doc])