Python Forum - AttributeError: 'Response' object has no attribute 'replace'

Pages: 1 2

#! python3
# using the inauguration speech of William Henry Harrison analyzed in the previous example, we can write the following code that generates arbitrarily # long Markov chains (with the chain length set to 100) based on the #structure of its text

import requests
from random import randint


def wordListSum(wordList):
    sum = 0
    for word, value in wordList.items():
        sum += value
    return sum


def retrieveRandomWord(wordList):
    randIndex = randint(1, wordListSum(wordList))
    for word, value in wordList.items():
        randIndex -= value
        if randIndex <= 0:
            return word


def buildWordDict(text):
    # Remove newlines and quotes
    text = text.replace("\n", " ")
    text = text.replace('"', "")
    # Make sure punctuation marks are treated as their own "words"
    # so that they will be included in the Markov chain
    punctuation = [",", ".", ";", ":"]
    for symbol in punctuation:
        text = text.replace(symbol, " " + symbol + " ")
    words = text.split(" ")
    # Filter out empty words
    words = [word for word in words if word != ""]
    wordDict = {}
    for i in range(1, len(words)):
        if words[i - 1] not in wordDict:
            # Create a new dictionary for this word
            wordDict[words[i - 1]] = {}
        if words[i] not in wordDict[words[i - 1]]:
            wordDict[words[i - 1]][words[i]] = 0
        wordDict[words[i - 1]][words[i]] += 1
    return wordDict


text = requests.get("http://pythonscraping.com/files/inaugurationSpeech.txt")
if text.status_code == 200:
    content = text.text
wordDict = buildWordDict(text)


# Generate a Markov chain of length 100
length = 100
chain = ""
currentWord = "I"
for i in range(0, length):
    chain += currentWord + " "
    currentWord = retrieveRandomWord(wordDict[currentWord])

print(chain)

Error:Traceback (most recent call last):
  File "C:\Python36\kodovi\markov.py", line 49, in <module>
    wordDict = buildWordDict(text)
  File "C:\Python36\kodovi\markov.py", line 25, in buildWordDict
    text = text.replace("\n", " ")
AttributeError: 'Response' object has no attribute 'replace'
>>>

There is a replace method in the documentation, does this message means that there is not attribute new line?
And what is wrong with the line 49?

requests.get returns a Request object, which has no replace method. If you want to use the string replace method, you need to get the text attribute of the Request object, and use replace on that.

Thank you, this is how it should be done:

text = requests.get("http://pythonscraping.com/files/inaugurationSpeech.txt")
if text.status_code == 200:
    content = text.text
wordDict = buildWordDict(content)

It may be off topic but would anyone be so kind to give me further explanations on lines 36-42. It's about creating two-dimensional dictionary but it looks confusing to me.

Line 36 is looping through the indexes of the words list. Not very pythonic. You should loop over items, not the indexes. This loop is meant to be over pairs of consecutive words, but there are ways to do that without resorting to indexes.

Lines 37 and 39 create a sub-dictionary for the previous word (note the for loop starts with 1, the index of the second word) if it doesn't already have one. You could use collections.defaultdict(dict) instead.

Lines 40 and 41 creates a zero count for the current word if it's not in the sub-dictionary for the previous word. Again, you could do this with collections.defaultdict(int) instead, but it's not clear to me how to initialize the nested defaultdicts.

Finally, line 42 adds one to the count of the current word in the previous word's sub dictionary. So it's creating a dictionary with counts of pairs of words. Frex, wordDict['spam']['eggs'] would be the number of times the word 'eggs' occurred after the word 'spam'.

I am a shame of my ignorance but will dare to ask. What does this piece of code means:

 for word, value in wordList.items():
        randIndex -= value
        if randIndex <= 0:
            return word

why randIndex-value?

It's a way to do a weighted random selection. You have a dictionary of words and their weight (the value). You make a random number from 1 to the total of the weights (that's what is in randIndex). Then you go through the words and subtract each word's value from randIndex (randIncex -= value is equivalent to randIndex = randIndex - value). When randIndex gets to zero or less, that is your weighted random choice. It makes it so that a word with value v is selected with a probability v / t, where t is the sum of all the values.

In 3.6+, the random module has choices, which can handle this for you more efficiently. However, prior to that you have to use code like the above.

Does that mean that some words will never be returned? Or they will because dictionaries are unordered? In that case, why do we need to do this calculation if we can randomly choose any word?

(Mar-19-2019, 11:31 PM)Truman Wrote: [ -> ]Does that mean that some words will never be returned? Or they will because dictionaries are unordered? In that case, why do we need to do this calculation if we can randomly choose any word?

The only time a word would never be returned is if it's weight was 0. Then it would have a probability 0 / t of being returned. It means that some words are more likely to be returned than others, and some words are less like to be returned than others. The standard random.choice() selects every item in the provided sequence with equal likelihood.

(Mar-19-2019, 11:37 PM)ichabod801 Wrote: [ -> ]
(Mar-19-2019, 11:31 PM)Truman Wrote: [ -> ]Does that mean that some words will never be returned? Or they will because dictionaries are unordered? In that case, why do we need to do this calculation if we can randomly choose any word?

The only time a word would never be returned is if it's weight was 0. Then it would have a probability 0 / t of being returned. It means that some words are more likely to be returned than others, and some words are less like to be returned than others. The standard random.choice() selects every item in the provided sequence with equal likelihood.

That's exactly why I don't understand why the author of this code used weighted random selection. Wouldn't it be more 'fair' that each word has an equal chance of being selected?

Pages: 1 2