gensim (TfidfModel): How much is the Tf-Idf computed? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: gensim (TfidfModel): How much is the Tf-Idf computed? (/thread-34118.html) |
gensim (TfidfModel): How much is the Tf-Idf computed? - JaneTan - Jun-28-2021 Hi For the below test text , test=['test test', 'test toy'], the tf-idf score is [['test', 1.0]] [['test', 0.35], ['toy', 0.94]] I am not sure how it is arrived at. Can someone show an example? When I change to NO normalisation (smartirs='ntn'), I get [['test', 1.17]] [['test', 0.58], ['toy', 1.58]] this doesn't seem to tally with what I get via direct computation of tfidf w, d) = tf x idf Eg doc 1: for "test" word tf= 1 idf= log(2/2) = 0 tf-idf = 0 test=['test test', 'test toy'] texts = [simple_preprocess(doc) for doc in test] mydict= corpora.Dictionary(texts) mycorpus = [mydict.doc2bow(doc, allow_update=True) for doc in texts] tfidf = models.TfidfModel(mycorpus, smartirs='ntc') for doc in tfidf[mycorpus]: print([[mydict[id], np.around(freq, decimals=2)] for id, freq in doc]) |