Python Forum
gensim (TfidfModel): How much is the Tf-Idf computed?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
gensim (TfidfModel): How much is the Tf-Idf computed?
#1
Hi

For the below test text ,
test=['test test', 'test toy'],
the tf-idf score is
[['test', 1.0]]
[['test', 0.35], ['toy', 0.94]]

I am not sure how it is arrived at. Can someone show an example?

When I change to NO normalisation (smartirs='ntn'), I get

[['test', 1.17]]
[['test', 0.58], ['toy', 1.58]]

this doesn't seem to tally with what I get via direct computation of

tfidf w, d) = tf x idf

Eg

doc 1: for "test" word
tf= 1
idf= log(2/2) = 0
tf-idf = 0

test=['test test', 'test toy']

texts = [simple_preprocess(doc) for doc in test]

mydict= corpora.Dictionary(texts)
mycorpus = [mydict.doc2bow(doc, allow_update=True) for doc in texts]
tfidf = models.TfidfModel(mycorpus, smartirs='ntc')

for doc in tfidf[mycorpus]:
    print([[mydict[id], np.around(freq, decimals=2)] for id, freq in doc])  
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020