Python Forum

Full Version: Homework question
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
So I have this assignment for a coursera course I am taking. Here is the assignment:

Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
X-DSPAM-Confidence: 0.8475
Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution.
You can download the sample data at http://www.py4e.com/code3/mbox-short.txt when you are testing below enter mbox-short.txt as the file name.

And here is what I have written for my code:

fname = input("Enter file name: ")
fh = open(fname).read()
count=0
total=0
for line in fh:
    if not line.startswith("X-DSPAM-Confidence:") : continue
    count=count+1
    pos=line.find(":")
    number= line[pos+1:]
    total=total+float(number)
avg = total/count    
print("Average spam confidence: ",avg)
However, when I try to test the code, it says that I cannot divide by zero, which I suppose means there is a problem with my count and total for the average. Can anyone see if they can spot where I've gone wrong? Thanks in advance.
Perhaps use open(fname) instead of open(fname).read()
I guess it did not find the string or fh is empty.
Instead of summing the total confidence and counting the amount, you can append to a list or use a generator.
The list knows it's own size. So if you have the list, you have all information you need. The values and the size.
If the list is empty, then don't do the calculation. This safes you against dividing by 0.

import statistics
import io

# just to get the data online
import requests


def get_confidence_avg(fd):
    """
    fd should be a file-like object in text mode
    or an iterable with str (lines) as elements 
    """
    for line in fd:
        if line.startswith('X-DSPAM-Confidence:'):
            yield float(line.split(':')[1])


mail_fd = io.StringIO(requests.get('https://www.py4e.com/code3/mbox-short.txt').text)
# mail_fd is now a file like object and holds the data from the example
# you can use
# with open(somefile) as fd:
#     confidences = list(get_confidence_avg(fd))

confidences = list(get_confidence_avg(mail_fd)) # populating the list with confidence

if confidences:
    mean = statistics.mean(confidences)
    median = statistics.median(confidences)
    # or calculating the mean manually
    mean_manual = sum(confidences) / len(confidences)
    print('Mean / Mean manual / Median')
    print(mean, mean_manual, median)
else:
    print('Did not found any match. No information about confidence.')
PS: Normally open().read() should work and the fd should be garbage collected afterwards. To be explicit is better. Just use a context-manager or use the modern pathlib. The Path object has the methods read_text and read_bytes. Both methods are using a context manager under the hood.
(Apr-09-2019, 01:10 PM)Gribouillis Wrote: [ -> ]Perhaps use open(fname) instead of open(fname).read()


This solved the problem. Thanks so much!
Think about this special case. What happens, if 'X-DSPAM-Confidence:' is not in the mail.
By the way, I have not realized it, that you're iterating over a string.

Iterating on an open file obejct => lines
Iterating on a str object => single chars