Python Forum
Basic one: Aggregating from a dictionary - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Basic one: Aggregating from a dictionary (/thread-20472.html)



Basic one: Aggregating from a dictionary - Mustey - Aug-12-2019

I am so sorry, that's a stupid mistake (I forgot to return!). Feel free to delete!


Sorry, I haven't coded for 2 years (and I wasn't any good when I did!) and now I need to write a python script.
Basically, I need to tell people what files they own on the system. I've managed to get the files "crawled", producing a JSON like:
{'file1.txt':'John','file2.txt':'James','file3.txt':'John'}

My next step is to aggregate them like so (this is the format my next method expects):
{'John':['file1.txt','file3.txt'],'James':[file2.txt]}

I don't care about the order. Plus I know how to .sort() if I need to :)
I have a feeling I might be able to use NumPy, Pandas DataFrame or some other instant solution but I don't know these yet
and I don't want to jump the gun.
Also, it really irritates me I have something that should work but I seem to miss a point about language semantics.

Here's what I got:
def group_by_owners(to_group):
    # get list of owners:
    owners = to_group.values()
    # Remove duplicated values:
    owners = list(set(owners))
    # Prepare a list of tuples from the dict:
    ownership_tuples = to_group.items()

    # Find files per each owner:
    result = {}
    for current_owner in owners:
        current_owner_files=[]
        # Brute search through data:
        for (file_name, owner_name) in ownership_tuples:
            if owner_name == current_owner:
                current_owner_files.append(file_name)
        result[current_owner] = current_owner_files

files = {
    'file1.txt': 'John',
    'file2.txt': 'James',
    'file3.txt': 'John'
}

print (group_by_owners(files))
Running this returns: None

1. I thought that with "(file_name, owner_name)" I would be able to traverse the "ownership_tuples" list but maybe I am wrong?
2. I googled for an hour but I couldn't find help on how to set a dictionary value when the key is a variable. I just guessed here:
result[current_owner]
3. When I do print(ownership_tuples) I get:
dict_items([('file1.txt', 'John'), ('file2.txt', 'James'), ('file3.txt', 'John')])
What is this 'dict_items'? Is it an indicator that I am doing something wrong? I expect to print a tuple, not a dictionary.

PS: any ideas to make this more elegant will be happily accepted, as I am just warming up my brain to programming again and could use some inspiration!
PS2: It's a surprise I get so much done with computers. I am a bottom-feeder in the world of programming. Apologies for my beginner questions!


RE: Basic one: Aggregating from a dictionary - buran - Aug-12-2019

it can probably be done with NumPy too
but there is defaultdict from collections module

from collections import defaultdict
import json
spam = defaultdict(list)
files = {'fle1.txt': 'John', 'file2.txt': 'James', 'file3.txt': 'John'}
for fname, owner in files.items():
    spam[owner].append(fname)
print(spam)

# dump as json
eggs = json.dumps(spam)
print(eggs)
Output:
defaultdict(<class 'list'>, {'John': ['fle1.txt', 'file3.txt'], 'James': ['file2.txt']}) {"John": ["fle1.txt", "file3.txt"], "James": ["file2.txt"]}