May-17-2020, 01:09 PM
Good evening, I recently been studying regarding about ChatBots using the Multinomial Naive Bayes approach and I got this problem the code when collecting data from a JSON File. What am I missing?
[ { "main_category": "News & Events", "question": "Why did the U.S Invade Iraq ?", "answer": "A small group of politicians believed strongly that the fact that Saddam Hussien remained in power after the first Gulf War was a signal of weakness to the rest of the world, one that invited attacks and terrorism. Shortly after taking power with George Bush in 2000 and after the attack on 9/11, they were able to use the terrorist attacks to justify war with Iraq on this basis and exaggerated threats of the development of weapons of mass destruction. The military strength of the U.S. and the brutality of Saddam's regime led them to imagine that the military and political victory would be relatively easy." }, { "main_category": "Education & Reference", "question": "How to get rid of a beehive?", "answer": "Call an area apiarist. They should be able to help you and would most likely remove them at no charge in exchange for the hive. The bees have value and they now belong to you." } ]
import nltk from nltk.corpus import stopwords from nltk.stem.lancaster import LancasterStemmer import json stemmer = LancasterStemmer() intents = json.loads(open('data/intents.json', 'r').read()) training_data = [] for k, row in enumerate(intents): training_data.append(row['main_category']) training_data.append(row['question']) # capture unique stemmed words in the training corpus corpus_words = {} class_words = {} classes = list(set([a['main_category'] for a in training_data])) for c in classes: class_words[c] = [] for data in training_data: # tokenize each sentence into words for word in nltk.word_tokenize(data['question']): # ignore a few things if word not in ["?", "'s"]: # stem and lowercase each word stemmed_word = stemmer.stem(word.lower()) if stemmed_word not in corpus_words: corpus_words[stemmed_word] = 1 else: corpus_words[stemmed_word] += 1 class_words[data['question']].extend([stemmed_word]) # we now have each word and the number of occurances of the word in our training corpus (the word's commonality) print ("Corpus words and counts: %s" % corpus_words) # also we have all words in each class print ("Class words: %s" % class_words)
Error:Traceback (most recent call last):
File "main.py", line 22, in <module>
classes = list(set([a['main_category'] for a in training_data]))
File "main.py", line 22, in <listcomp>
classes = list(set([a['main_category'] for a in training_data]))
TypeError: string indices must be integers