Machine Learning errors

obi95 · Mar-13-2018, 06:02 PM

Hi everyone, I'm currently in the process of trying to use machine learning to sort through emails but whenever I run the code, I keep getting errors, one such error is saying 'list' object has no attribute 'most_common'. I have tried to figure out why it is doing this but can not figure out why nor how to fix it. Below is my code, if anyone can help I'd be really thankful. The error appears at the dictionary = dictionary.most_common side. I dont know if there are any more errors as the program will not go past this part

import os
import numpy as np
from collections import Counter
from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB
from sklearn.svm import SVC, NuSVC, LinearSVC

def make_dictionary(train_dir):
    emails = [os.path.join(train_dir,f) for f in os.listdir(train_dir)]
    all_words = []
    for mail in emails:
        with open(mail) as m:
            for i,line in enumerate(m):
                if i == 2:
                    words = line.split()
                    all_words += words
    dictionary = Counter(all_words)
    return dictionary
train_dir = 'train-mails'
dictionary = make_dictionary(train_dir)
list_to_remove = dictionary.keys()
for item in list_to_remove:
    if item.isalpha() == False:
        del dictionary[item]
    elif len(item) == 1:
        del dictionary[item]
    dictionary = dictionary.most_common(3000)

def extract_features(mail_dir):
    files = [os.path.join(mail_dir,fi) for fi in os.listdir(mail_dir)]
    features_matrix = np.zeros((len(files),3000))
    docID = 0;
    for fil in files:
        with open(fil) as fi:
            for i,line in enumerate(fi):
                if i == 2:
                    words = line.split()
                for word in words:
                  wordID = 0
                  for i,d in enumerate(dictionary):
                    if d[0] == word:
                      wordID = i
                      features_matrix[docID,wordID] = words.count(word)
            docID = docID + 1
        return features_matrix

train_labels = np.zeros(702)
train_labels[351:701] = 1
train_matrix = extract_features(train_dir)


model1 = MultinomialNB()
model2 = LinearSVC()
model1.fit(train_matrix,train_labels)
model2.fit(train_matrix,train_labels)

test_dir = 'test-mails'
test_matrix = extract_features(test_dir)
test_labels = np.zeros(260)
test_labels[130:260] = 1
result1 = model1.predict(test_matrix)
result2 = model2.predict(test_matrix)
print confusion_matrix(test_labels,result1)
print confusion_matrix(test_labels,result2)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Feature Selection in Machine Learning	shiv11	4	1,880	Apr-09-2024, 02:22 PM Last Post: DataScience
	[machine learning] identifying a number 0-9 from a 28x28 picture, not working	SheeppOSU	0	1,856	Apr-09-2021, 12:38 AM Last Post: SheeppOSU
	Getting started in Machine Learning	Harshil	5	3,251	Dec-07-2020, 04:06 PM Last Post: sridhar
	Python Machine Learning: For Data Extraction	JaneTan	0	1,858	Nov-24-2020, 06:45 AM Last Post: JaneTan
	IndexError in Array while trying to do machine learning	Mariaoye	0	1,900	Nov-12-2020, 12:35 AM Last Post: Mariaoye
	Errors with Machine Learning trading bot-- not sure why	MattKahn13	0	1,373	Aug-07-2020, 08:19 PM Last Post: MattKahn13
	How useful is PCA for machine learning?	Marvin93	0	1,541	Aug-07-2020, 02:07 PM Last Post: Marvin93
	How to extract data from paragraph using Machine Learning with python?	bccsthilina	2	3,061	Jul-27-2020, 07:02 AM Last Post: hussainmujtaba
	Machine Learning: Process	Enanda	13	4,322	Mar-18-2020, 02:02 AM Last Post: jefsummers
	Machine Learning Polynomial Regression	braveYug	0	1,721	Nov-13-2019, 11:41 AM Last Post: braveYug

Machine Learning errors

User Panel Messages

Announcements