Python Forum
How to compute conditional unigram probabilities?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to compute conditional unigram probabilities?
#1
Hello,

i have difficulties with my homework (Task 4).
I don't know how to do this.
I have already an attempt but I think it is wrong and I don't know how to go on.
The task gives me pseudocode as a hint but I can't make code from it.
I know it's not that hard and ist only a few lines, but I have no Idea what to do.

my code for task 1, 2, 3 and my attempt for task 4:
 
import re

class Ngram:
    filename = ""
    n = 0
    raw_counts = {}
    prob = {}
    cond_prob = {}
    
    # Task 1
    def __init__(self, filename="", n=0):
        self.filename = filename
        self.n = n

    # Task 2
    def extract_raw_counts(self):
        fp = open(self.filename, 'r') 
        allLines = fp.readlines()
        for line in allLines: 
            tokenLst = tokenize_smart(line.rstrip("\r\n"))
            for i in range(0,self.n-1):
                tokenLst.insert(0,"BOS")
                tokenLst.append("EOS")
            for i in range(len(tokenLst)-self.n):
                newTuple = tuple(tokenLst[i:i+self.n])
                if newTuple in self.raw_counts:
                    self.raw_counts[newTuple] += 1
                else:
                    self.raw_counts[newTuple] = 1
        
    # Task 3
    def extract_probabilities(self):
        sumRawCounts = sum(self.raw_counts.values()) + len(self.raw_counts)
        for key in self.raw_counts:
            self.prob[key] = self.raw_counts[key] / sumRawCounts

    # Task 4
    def extract_conditional_probabilities(self):
        #my attemt for task 4
        for key in self.prob:
            mgram = key[0:self.n-1]
            unigram = key[self.n]
            if not mgram in self.prob:
                self.prob[mgram] = {}
            else:
                self.cond_prob[mgram] = unigram
            
        
        pass

    # Task 5
    def generate_random_token(self, mgram):
        """
        Generate a random next token based on an n-1 gram,
        taking into account the probability distribution over the possible next tokens for that n-1-gram.

        :param mgram: the n-1 gram to generate the next token for.
        :type mgram: a tuple (of length n-1) of strings.
        :return a random next token for the n-1-gram.
        :rtype str
        """
        pass

    # Task 6
    def generate_random_sentence(self):
        """
        Generate a random sentence.

        :return a random sentence
        :rtype list[str]
        """
        pass


def tokenize_smart(sentence):
    """
    Tokenize the sentence into tokens (words, punctuation).

    :param sentence: the sentence to be tokenized
    :type sentence: str
    :return: list of tokens in the sentence
    :rtype: list[str]
    """
    tokens = []
    for word in re.sub(r" +", " ", sentence).split():
        word = re.sub(r"[\"„”“»«`\(\)]", "", word)
        if word != "":
            if word[-1] in ".,!?;:":
                if len(word) == 1:
                    tokens += [word]
                else:
                    tokens += [word[:-1], word[-1]]
            else:
                tokens.append(word)

    return tokens


def list2str(sentence):
    """
    Convert a sentence given as a list of strings to the sentence as a string separated by whitespace.
    
    :param sentence: the string list to be joined
    :type sentence: list[str]
    :return: sentence as a string, separated by whitespace
    :rtype: str
    """
    sentence = " ".join(sentence)
    sentence = re.sub(r" ([\.,!\?;:])", r"\1", sentence)
    return sentence


if __name__ == '__main__':
    
    # Task 1
    print("Task 1:")
    ngram_model = Ngram("de-sentences-tatoeba.txt", 2)
    print(ngram_model.n, ngram_model.filename)
    print(ngram_model.raw_counts, ngram_model.prob, ngram_model.cond_prob)
    
    # Task 2
    print("\nTask 2:")
    ngram_model.extract_raw_counts()
    print(ngram_model.raw_counts[("kaltes", "Land")])
    print(ngram_model.raw_counts[("schönes", "Land")])
    
    # Task 3
    print("\nTask 3:")
    ngram_model.extract_probabilities()
    print(ngram_model.prob[("kaltes", "Land")])
    print(ngram_model.prob[("schönes", "Land")])
    
    '''
    # Task 4
    ngram_model.extract_conditional_probabilities()
    print(ngram_model.cond_prob[(" beobachteten ",)])
    print(ngram_model.cond_prob[("schönes",)][("Land")])
    # Task 5
    print(ngram_model.generate_random_token(("den",)))
    print(ngram_model.generate_random_token(("den",)))
    print(ngram_model.generate_random_token(("den",)))
    # Task 6
    print(list2str(ngram_model.generate_random_sentence()))
    print(list2str(ngram_model.generate_random_sentence()))
    '''
Task 1 and 2:
[Image: c8kWdT3]

Task 3 and 4: (Here I get stucked)
[Image: nnd3pD4]


Can someone help me or give me a hint?

Thank you in advance
Reply
#2
cross-post on SO
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
(Jan-25-2020, 02:22 PM)buran Wrote: cross-post on SO


Hi buran,

can you delete this thread?
Im not allowed to post my code public in the internet because other students of my class could copy this.
Then I would get 0 points for this homework.

Thank you.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Maths and python: Compute stress level cheerful 1 2,733 Oct-20-2021, 10:05 AM
Last Post: Larz60+
  Compute complex solutions in quadratic equations liam 1 1,897 Feb-09-2020, 04:18 PM
Last Post: Gribouillis
  To extract a specific column from csv file and compute the average vicson 2 8,123 Oct-20-2018, 03:18 AM
Last Post: vicson
  Write a program to compute the sum of the terms of the series: 4 - 8 + 12 - 16 + 20 - chewey777 0 2,823 Mar-24-2018, 12:39 AM
Last Post: chewey777
  How do you compute tf-idf from a list without using the counter class syntaxkiller 8 5,255 Dec-01-2017, 05:24 PM
Last Post: nilamo
  compute gross pay jamesuzo 1 10,334 Sep-07-2017, 01:47 PM
Last Post: ichabod801

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020