Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python entropy calculation
#1
With this code in Python, I calculate the entropy value of a txt file based on trigrams, but something goes wrong, because the output value is 110.51908986855025 (which is way too high considering that the maximum entropy value of a file is 8 bits based on a logarithmic base of 2). Does anybody can find the mistake (or mistakes)?

My apologies that I put the code in this way, but the "code" button doesn't work for some reason.

import math

file = open("C:/Users/Mik/Documents/import pandas as pd.txt", "r")
content = file.read()

def generate_trigrams(string):
    trigrams = []
    for i in range(len(string) - 2):
        trigram = string[i:i + 3]
        trigrams.append(trigram)
    return trigrams

trigrams_content = generate_trigrams(content)
n_trigrams = len(trigrams_content)
print(n_trigrams)

def entropy(trigrams_content):
    trigram_freqs = {}
    for trigram in trigrams_content:
        if trigram in trigram_freqs:
            trigram_freqs[trigram] += 1
        else:
            trigram_freqs[trigram] = 1

    probs = [trigram_freqs[trigram] / len(trigrams_content) for trigram in trigrams_content]

    entropy = 0
    for prob in probs:
        entropy -= prob * math.log2(prob)

    return entropy

print(entropy(trigrams_content))
Gribouillis write Dec-25-2023, 09:44 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply
#2
Cross posted in stackoverflow .
« We can solve any problem by introducing an extra level of indirection »
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  computing entropy using pickle files baran01 2 2,450 Dec-30-2019, 09:45 PM
Last Post: micseydel
  python Calculation error: 39.8-0.1=39.699999999999996 wei 2 2,098 Jun-10-2019, 10:22 AM
Last Post: wei

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020