Python Forum
gensim (TfidfModel): How much is the Tf-Idf computed? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: gensim (TfidfModel): How much is the Tf-Idf computed? (/thread-34118.html)



gensim (TfidfModel): How much is the Tf-Idf computed? - JaneTan - Jun-28-2021

Hi

For the below test text ,
test=['test test', 'test toy'],
the tf-idf score is
[['test', 1.0]]
[['test', 0.35], ['toy', 0.94]]

I am not sure how it is arrived at. Can someone show an example?

When I change to NO normalisation (smartirs='ntn'), I get

[['test', 1.17]]
[['test', 0.58], ['toy', 1.58]]

this doesn't seem to tally with what I get via direct computation of

tfidf w, d) = tf x idf

Eg

doc 1: for "test" word
tf= 1
idf= log(2/2) = 0
tf-idf = 0

test=['test test', 'test toy']

texts = [simple_preprocess(doc) for doc in test]

mydict= corpora.Dictionary(texts)
mycorpus = [mydict.doc2bow(doc, allow_update=True) for doc in texts]
tfidf = models.TfidfModel(mycorpus, smartirs='ntc')

for doc in tfidf[mycorpus]:
    print([[mydict[id], np.around(freq, decimals=2)] for id, freq in doc])