Using dictionary to find the most sent emails from a file

siliusu · Apr-21-2021, 10:59 PM

Hi Everyone,
I have been stuck with this question for a whole week. The staff of our course is not helpful at all. Hope I could get any clue from you guys. Thanks.
This is my assignment:
Write a program to read through the mbox-short.txt and figure out who has sent the greatest number of mail messages. The program looks for 'From ' lines and takes the second word of those lines as the person who sent the mail. The program creates a Python dictionary that maps the sender's mail address to a count of the number of times they appear in the file. After the dictionary is produced, the program reads through the dictionary using a maximum loop to find the most prolific committer.
File mbox-short:https://www.py4e.com/code3/mbox-short.txt

This is my code to it:

name=input("Enter file: ")
fh=open(name)
largest=None
counts=dict()
for line in fh:
    if line.startswith("From "):
        x=line.split()
        emails=x[1]
        print(emails)
        for word in emails:
            counts[word]=counts.get(word,0)+1
            if largest is None or counts[word] > largest:
                largest=counts[word]
            print(counts,largest)

It comes out to count every alphabet but not a single email. How can I count the emails?
I tried to loop over the x as"For word in x:", then it comes out to count everything such as from, emails, time and dates. In this case how can I pick up only the emails and its counts? Thank you!

DPaul · (This post was last modified: Apr-22-2021, 06:33 AM by DPaul.)

Hi,

Without seeing the format of the mbox file, this seems very straightforward.
All you need is lines that start with "From" and you use line[0] and line[1] after the split.
The keys of the dictionary could be a set() of line[1] occurences.
The number of Froms = number of emails sent. Why count anything else?
Unless, and this is not clear from your question, the From sender may also appear elsewhere in the file (recipient?), and
you need to count those too ?

Paul

DeaD_EyE · Apr-22-2021, 07:39 AM

The From lines looks like this:

Output:From alice@edu  Thu Jun 16 16:12:12 2005

From bob@gov   Thu Jun 16 18:13:12 2005

From ted@com  Thu Jul 28 09:53:31 2005

From bob@gov  Thu Jul 28 09:59:31 2005

From ted@com  Thu Jul 28 15:53:31 2005

I got it from an example: doc/networkx-2.1/examples/drawing/unix_email.mbox

Output:From alice@edu  Thu Jun 16 16:12:12 2005
From: Alice <alice@edu>
Subject: NetworkX
Date: Thu, 16 Jun 2005 16:12:13 -0700
To: Bob <bob@gov>
Status: RO
Content-Length: 86
Lines: 5

Bob, check out the new networkx release - you and
Carol might really like it.

Alice

From bob@gov   Thu Jun 16 18:13:12 2005
Return-Path: <bob@gov>
Subject: Re: NetworkX
From: Bob <bob@gov>
To: Alice <alice@edu>
Content-Type: text/plain
Date: Thu, 16 Jun 2005 18:13:12 -0700
Status: RO
Content-Length: 26
Lines: 4

Thanks for the tip.

Bob

From ted@com  Thu Jul 28 09:53:31 2005
Return-Path: <ted@com>
Subject: Graph package in Python?
From: Ted <ted@com>
To: Bob <bob@gov>
Content-Type: text/plain
Date: Thu, 28 Jul 2005 09:47:03 -0700
Status: RO
Content-Length: 90
Lines: 3

Hey Ted - I'm looking for a Python package for
graphs and networks.  Do you know of any?

From bob@gov  Thu Jul 28 09:59:31 2005
Return-Path: <bob@gov>
Subject: Re: Graph package in Python?
From: Bob <bob@gov>
To: Ted <ted@com>
Content-Type: text/plain
Date: Thu, 28 Jul 2005 09:59:03 -0700
Status: RO
Content-Length: 180
Lines: 9

Check out the NetworkX package - Alice sent me the tip!

Bob

>> bob@gov scrawled:
>> Hey Ted - I'm looking for a Python package for
>> graphs and networks.  Do you know of any?

From ted@com  Thu Jul 28 15:53:31 2005
Return-Path: <ted@com>
Subject: get together for lunch to discuss Networks?
From: Ted <ted@com>
To: Bob <bob@gov>, Carol <carol@gov>, Alice <alice@edu>
Content-Type: text/plain
Date: Thu, 28 Jul 2005 15:47:03 -0700
Status: RO
Content-Length: 139
Lines: 5

Hey everyrone!  Want to meet at that restaurant on the
island in Konigsburg tonight?  Bring your laptops
and we can install NetworkX.

Ted

from collections import defaultdict
from collections import Counter

# test = defaultdict(int)
# test["Not Existing Key"] -> 0
# text["Not Existing Key"] += 1
# then "Not Existing Key" -> 1


# The counter counts unique objects in a list or
# from other iterables. If it's a collection like a dict or defaultdict,
# the results are also copied


name = input("Enter file: ")
fh = open(name)
# a context manager is better

counts = defaultdict(int)
for line in fh:
    if line.startswith("From "):
        email = line.split(maxsplit=3)[1]
        # maxsplit limits the split of 3 elements.
        # from, email, rest ....
        # we need only the email, which is the second element
        # the third element is the rest of the line
        counts[email] += 1
        # defaultdict and counter supports this
        # if you use a defaultdict, then the initial datatype
        # must be int


# you've forgotten to close the file
# this could not happen with a context manager
fh.close()


print("Results:")
for email, count in counts.items():
    print(email, "->", count)


# we have already the Results in `counts`
# using Counter to reuse the data

counts2 = Counter(counts)
# Counter has the method most_common
print()
print("Top 5:")
for email, count in counts2.most_common(5):
    print(email, "->", count)

Counter could be use in the first place instead of defaultdict.
You can do this also manually, which is good to learn how to memorize elements.

If you make your own Counter, then use a set() as place to store seen E-Mails.

emails = ["a", "a", "c", "b"]
seen = set()
result = {}
for email in emails:
    if email in seen:
        result[email] += 1
    else:
        result[email] = 1
        seen.add(email)
        # a set uses add to add elements
        # a list uses append
        # a set has only unique elements
        # and is very fast in checking containment of an element in the set

I hope this helps a little to understand.

**perfringo** · Apr-22-2021, 03:30 PM

This is homework so special attention must be paid to terms and conditions.

- The program looks for 'From ' lines and takes the second word of those lines as the person who sent the mail.
- The program creates a Python dictionary that maps the sender's mail address to a count of the number of times they appear in the file.
- After the dictionary is produced, the program reads through the dictionary using a maximum loop to find the most prolific committer.

I have trouble understanding what is 'maximum loop', therefore I will use built-in max.

counter = dict()

with open('mbox-short.txt', 'r') as f:
    for line in f:
        if line.startswith('From '):
            address = line.split(maxsplit=2)[1]
            try:
                counter[address] += 1
            except KeyError:
                counter[address] = 1


print(max(counter, key=lambda rec: rec[1]))
# [email protected]

DeaD_EyE · Apr-22-2021, 03:58 PM

Iterating over a dict yields only the keys and not the items (key, value).

This should raise an IndexError,
if the key has a len of 1 or 0 and if the key is longer than 1,
it will sort by the second character of the key.

print(max(counter, key=lambda rec: rec[1]))

Use instead the items method, which return for each item a tuple with (key, value).

print(max(counter.items(), key=lambda rec: rec[1]))

lambda is an anonymous function.
It's used in this case as a key-function to sort the values and not the keys.
The values are your counts and the keys are the email addresses.

key_function = lambda rec: rec[1]

# is similar to:

def key_function(rec):
    return rec[1]

This key_function could be used as a key for: max, min, sorted, itertools.groupby

These functions are taking the return value from the key_function for comparison.

A tiny example:

mapping = {"a": 3, "b": 2, "c": 1}
print(mapping)

# max via key
print("Biggest key (lexicographical order) ->", max(mapping.items(), key=lambda item: item[0]))


# max via value
print("Biggest integer", max(mapping.items(), key=lambda item: item[1]))

Output:Biggest key (lexicographical order) -> ('c', 1)
Biggest integer ('a', 3)

**perfringo** · Apr-22-2021, 04:39 PM

(Apr-22-2021, 03:58 PM)DeaD_EyE Wrote: Iterating over a dict yields only the keys and not the items (key, value).

You are absolutely correct.

Strangely enough on this particular dataset this code produced correct result (gwen count is 5 and this is max in this dictionary). It is gentle reminder that code should be always tested...

siliusu · Apr-22-2021, 06:07 PM

Thank you for your reply so much!
I tried the code. It shows traceback on "counter[address] += 1",saying it must be integers or slices, not str.

(Apr-22-2021, 03:30 PM)perfringo Wrote: This is homework so special attention must be paid to terms and conditions.

- The program looks for 'From ' lines and takes the second word of those lines as the person who sent the mail.
- The program creates a Python dictionary that maps the sender's mail address to a count of the number of times they appear in the file.
- After the dictionary is produced, the program reads through the dictionary using a maximum loop to find the most prolific committer.

I have trouble understanding what is 'maximum loop', therefore I will use built-in max.
counter = dict()

with open('mbox-short.txt', 'r') as f:
    for line in f:
        if line.startswith('From '):
            address = line.split(maxsplit=2)[1]
            try:
                counter[address] += 1
            except KeyError:
                counter[address] = 1


print(max(counter, key=lambda rec: rec[1]))
# [email protected]

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	to find in dictionary given parameter 'name' and to output position	Liki	10	3,508	Oct-08-2023, 06:38 AM Last Post: Pedroski55
	dictionary output to text file (beginner)	Delg_Dankil	2	3,404	Jul-12-2023, 11:45 AM Last Post: deanhystad
	Updating dictionary in another py file	tommy_voet	1	6,379	Mar-28-2021, 07:25 PM Last Post: buran
	Making a dictionary from a file	instyabam	0	1,932	Oct-27-2020, 11:59 AM Last Post: instyabam
	how can i create a dictionary of dictionaries from a file	Astone	2	3,039	Oct-26-2020, 02:40 PM Last Post: DeaD_EyE
	Convert all actions through functions, fill the dictionary from a file	Astone	3	3,327	Oct-26-2020, 09:11 AM Last Post: DeaD_EyE
	Sending Emails in Portuguese	RenanPereira10	1	3,644	Jul-24-2020, 12:42 AM Last Post: nilamo
	how to find 'cycle' for key-value pairs in a dictionary?	junnyfromthehood	1	4,422	Sep-29-2019, 01:07 AM Last Post: ichabod801
	how to put text file in a dictionary	infected400	2	3,773	Jan-06-2019, 04:43 PM Last Post: micseydel
	Dictionary to .txt or .csv file	stanthaman42	9	6,284	Aug-08-2018, 03:37 PM Last Post: Vysero

Using dictionary to find the most sent emails from a file

User Panel Messages

Announcements