Python Forum
Basic one: Aggregating from a dictionary
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Basic one: Aggregating from a dictionary
#1
I am so sorry, that's a stupid mistake (I forgot to return!). Feel free to delete!


Sorry, I haven't coded for 2 years (and I wasn't any good when I did!) and now I need to write a python script.
Basically, I need to tell people what files they own on the system. I've managed to get the files "crawled", producing a JSON like:
{'file1.txt':'John','file2.txt':'James','file3.txt':'John'}

My next step is to aggregate them like so (this is the format my next method expects):
{'John':['file1.txt','file3.txt'],'James':[file2.txt]}

I don't care about the order. Plus I know how to .sort() if I need to :)
I have a feeling I might be able to use NumPy, Pandas DataFrame or some other instant solution but I don't know these yet
and I don't want to jump the gun.
Also, it really irritates me I have something that should work but I seem to miss a point about language semantics.

Here's what I got:
def group_by_owners(to_group):
    # get list of owners:
    owners = to_group.values()
    # Remove duplicated values:
    owners = list(set(owners))
    # Prepare a list of tuples from the dict:
    ownership_tuples = to_group.items()

    # Find files per each owner:
    result = {}
    for current_owner in owners:
        current_owner_files=[]
        # Brute search through data:
        for (file_name, owner_name) in ownership_tuples:
            if owner_name == current_owner:
                current_owner_files.append(file_name)
        result[current_owner] = current_owner_files

files = {
    'file1.txt': 'John',
    'file2.txt': 'James',
    'file3.txt': 'John'
}

print (group_by_owners(files))
Running this returns: None

1. I thought that with "(file_name, owner_name)" I would be able to traverse the "ownership_tuples" list but maybe I am wrong?
2. I googled for an hour but I couldn't find help on how to set a dictionary value when the key is a variable. I just guessed here:
result[current_owner]
3. When I do print(ownership_tuples) I get:
dict_items([('file1.txt', 'John'), ('file2.txt', 'James'), ('file3.txt', 'John')])
What is this 'dict_items'? Is it an indicator that I am doing something wrong? I expect to print a tuple, not a dictionary.

PS: any ideas to make this more elegant will be happily accepted, as I am just warming up my brain to programming again and could use some inspiration!
PS2: It's a surprise I get so much done with computers. I am a bottom-feeder in the world of programming. Apologies for my beginner questions!
Reply
#2
it can probably be done with NumPy too
but there is defaultdict from collections module

from collections import defaultdict
import json
spam = defaultdict(list)
files = {'fle1.txt': 'John', 'file2.txt': 'James', 'file3.txt': 'John'}
for fname, owner in files.items():
    spam[owner].append(fname)
print(spam)

# dump as json
eggs = json.dumps(spam)
print(eggs)
Output:
defaultdict(<class 'list'>, {'John': ['fle1.txt', 'file3.txt'], 'James': ['file2.txt']}) {"John": ["fle1.txt", "file3.txt"], "James": ["file2.txt"]}
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Aggregating CSV Data nb1214 1 1,606 Jun-10-2021, 06:15 PM
Last Post: Axel_Erfurt

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020