Python Forum

Full Version: Basic one: Aggregating from a dictionary
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am so sorry, that's a stupid mistake (I forgot to return!). Feel free to delete!


Sorry, I haven't coded for 2 years (and I wasn't any good when I did!) and now I need to write a python script.
Basically, I need to tell people what files they own on the system. I've managed to get the files "crawled", producing a JSON like:
{'file1.txt':'John','file2.txt':'James','file3.txt':'John'}

My next step is to aggregate them like so (this is the format my next method expects):
{'John':['file1.txt','file3.txt'],'James':[file2.txt]}

I don't care about the order. Plus I know how to .sort() if I need to :)
I have a feeling I might be able to use NumPy, Pandas DataFrame or some other instant solution but I don't know these yet
and I don't want to jump the gun.
Also, it really irritates me I have something that should work but I seem to miss a point about language semantics.

Here's what I got:
def group_by_owners(to_group):
    # get list of owners:
    owners = to_group.values()
    # Remove duplicated values:
    owners = list(set(owners))
    # Prepare a list of tuples from the dict:
    ownership_tuples = to_group.items()

    # Find files per each owner:
    result = {}
    for current_owner in owners:
        current_owner_files=[]
        # Brute search through data:
        for (file_name, owner_name) in ownership_tuples:
            if owner_name == current_owner:
                current_owner_files.append(file_name)
        result[current_owner] = current_owner_files

files = {
    'file1.txt': 'John',
    'file2.txt': 'James',
    'file3.txt': 'John'
}

print (group_by_owners(files))
Running this returns: None

1. I thought that with "(file_name, owner_name)" I would be able to traverse the "ownership_tuples" list but maybe I am wrong?
2. I googled for an hour but I couldn't find help on how to set a dictionary value when the key is a variable. I just guessed here:
result[current_owner]
3. When I do print(ownership_tuples) I get:
dict_items([('file1.txt', 'John'), ('file2.txt', 'James'), ('file3.txt', 'John')])
What is this 'dict_items'? Is it an indicator that I am doing something wrong? I expect to print a tuple, not a dictionary.

PS: any ideas to make this more elegant will be happily accepted, as I am just warming up my brain to programming again and could use some inspiration!
PS2: It's a surprise I get so much done with computers. I am a bottom-feeder in the world of programming. Apologies for my beginner questions!
it can probably be done with NumPy too
but there is defaultdict from collections module

from collections import defaultdict
import json
spam = defaultdict(list)
files = {'fle1.txt': 'John', 'file2.txt': 'James', 'file3.txt': 'John'}
for fname, owner in files.items():
    spam[owner].append(fname)
print(spam)

# dump as json
eggs = json.dumps(spam)
print(eggs)
Output:
defaultdict(<class 'list'>, {'John': ['fle1.txt', 'file3.txt'], 'James': ['file2.txt']}) {"John": ["fle1.txt", "file3.txt"], "James": ["file2.txt"]}