Python Forum

Good evening, I recently been studying regarding about ChatBots using the Multinomial Naive Bayes approach and I got this problem the code when collecting data from a JSON File. What am I missing?

[
    {
        "main_category": "News & Events", 
        "question": "Why did the U.S Invade Iraq ?",
        "answer": "A small group of politicians believed strongly that the fact that Saddam Hussien remained in power after the first Gulf War was a signal of weakness to the rest of the world, one that invited attacks and terrorism. Shortly after taking power with George Bush in 2000 and after the attack on 9/11, they were able to use the terrorist attacks to justify war with Iraq on this basis and exaggerated threats of the development of weapons of mass destruction. The military strength of the U.S. and the brutality of Saddam's regime led them to imagine that the military and political victory would be relatively easy."
    }, 
    {
        "main_category": "Education & Reference", 
        "question": "How to get rid of a beehive?",  
        "answer": "Call an area apiarist.  They should be able to help you and would most likely remove them at no charge in exchange for the hive.  The bees have value and they now belong to you."
    }
]

import nltk
from nltk.corpus import stopwords
from nltk.stem.lancaster import LancasterStemmer
import json

stemmer = LancasterStemmer()

intents = json.loads(open('data/intents.json', 'r').read())

training_data = []

for k, row in enumerate(intents):
    training_data.append(row['main_category'])
    training_data.append(row['question'])

# capture unique stemmed words in the training corpus
corpus_words = {}
class_words = {}

classes = list(set([a['main_category'] for a in training_data]))

for c in classes:
    class_words[c] = []
    
for data in training_data:
    # tokenize each sentence into words
    for word in nltk.word_tokenize(data['question']):
        # ignore a few things
        if word not in ["?", "'s"]:
            # stem and lowercase each word
            stemmed_word = stemmer.stem(word.lower())
            if stemmed_word not in corpus_words:
                corpus_words[stemmed_word] = 1
            else:
                corpus_words[stemmed_word] += 1
                
            class_words[data['question']].extend([stemmed_word])

# we now have each word and the number of occurances of the word in our training corpus (the word's commonality)
print ("Corpus words and counts: %s" % corpus_words)
# also we have all words in each class
print ("Class words: %s" % class_words)

Error:Traceback (most recent call last):
  File "main.py", line 22, in <module>
    classes = list(set([a['main_category'] for a in training_data]))
  File "main.py", line 22, in <listcomp>
    classes = list(set([a['main_category'] for a in training_data]))
TypeError: string indices must be integers

a['main_category']

a is being called as if it is a dictionary, the error is saying that a is actually a string which can only be indexed by integers.
either a should be a dictionary or 'main_category' should be an integer.

Sorry sir but I still don't get it. How do I change the line of code for that?

training_data is an iterable of some kind of values, they are currently strings, are they meant to be strings?
If they are you cant index them with a['main_category'] it would have to be an integer for example a[1] for the second item.

The training_data is suppose to be strings, because I'm using that to fetch that data in the JSON file. I wonder why it is not working but if I do this code, it function.

training_data.append({"main_category":"News & Events", "question":"Why did the U.S Invade Iraq ?", "answer":"A small group of politicians believed strongly that the fact that Saddam Hussien remained in power after the first Gulf War was a signal of weakness to the rest of the world, one that invited attacks and terrorism. Shortly after taking power with George Bush in 2000 and after the attack on 9/11, they were able to use the terrorist attacks to justify war with Iraq on this basis and exaggerated threats of the development of weapons of mass destruction. The military strength of the U.S. and the brutality of Saddam's regime led them to imagine that the military and political victory would be relatively easy."})

Because now it contains a dictionary which can be indexed as a['main_category']

So, I will use that part of the code instead of the data from the JSON file.

animrehrm

Yoriz

animrehrm

Yoriz

animrehrm

Yoriz

animrehrm