Data Dictionaries in Python

DeaD_EyE · (This post was last modified: Nov-25-2019, 08:26 AM by DeaD_EyE.)

You can make a filter-function.
Instead of jumping directly into pandas, you should know also the Python stuff.

def filter_by(data, **kwargs):
    for row in data:
         for key, value in kwargs.items():
             if row.get(key) != value:
                 break
         else:
             yield row


# if the results are saved in the list result
list(filter_by(result, Pos='4'))

In this example I do not type conversion. It's just a string comparison of equality.
The else-block of the for-loop is only then executed, if the for-loop has finished the iteration.
Breaking early out of the loop, will not execute the else-block of the for-loop.
This means only if all keys are existing and return the wanted values, it will yield a hit.
The use of dict.get() is mandatory, if you don't check if the wanted key exist.
Otherwise you'll get a KeyError.
The get method on a dict, return by default None, if the key does not exist.
You can define the default values as second argument in the get method.

The function itself return a generator, if called. The generator does nothing until
it's consumed by a for-loop or types like tuple, list, set, etc...
If you see in a function a yield statement, then it's a generator.

Generators can do funny stuff, like generating infinite sequences.

mrsenorchuck · Nov-25-2019, 09:36 AM

(Nov-25-2019, 08:26 AM)DeaD_EyE Wrote: You can make a filter-function.
Instead of jumping directly into pandas, you should know also the Python stuff.
def filter_by(data, **kwargs):
    for row in data:
         for key, value in kwargs.items():
             if row.get(key) != value:
                 break
         else:
             yield row


# if the results are saved in the list result
list(filter_by(result, Pos='4'))
In this example I do not type conversion. It's just a string comparison of equality.
The else-block of the for-loop is only then executed, if the for-loop has finished the iteration.
Breaking early out of the loop, will not execute the else-block of the for-loop.
This means only if all keys are existing and return the wanted values, it will yield a hit.
The use of dict.get() is mandatory, if you don't check if the wanted key exist.
Otherwise you'll get a KeyError.
The get method on a dict, return by default None, if the key does not exist.
You can define the default values as second argument in the get method.

The function itself return a generator, if called. The generator does nothing until
it's consumed by a for-loop or types like tuple, list, set, etc...
If you see in a function a yield statement, then it's a generator.

Generators can do funny stuff, like generating infinite sequences.

Thank you for this detail.

So like you say it is better to understand dictionaries, lists etc before jumping right in to pandas and that is what I am trying to achieve.

I have two items in my key which are all time league postition and club.

What is kwargs in your snippet?

I am assuming the 'value' is a string value you are passing to the filter?

Can you only filter on key values? What if I wanted to filter on teams that have been relegated but won the league like Leicester?

Regards,
Aidan.

**perfringo** · (This post was last modified: Nov-25-2019, 10:51 AM by perfringo.)

You should modify filter provided by Dead_EyE according to your needs.

One way to achieve result is convert value to int and make comparison instead. Following function filters out values which are less than in arguments given and yields only club name, not whole row:

def filter_by(data, **kwargs):
    for row in data:
         for key, value in kwargs.items():
             if int(row.get(key)) < value:
                 break
         else:
             yield row['Club']

If not relegated (Relegated < 1) and not been first (First < 1) will break: 'filter out teams which have at least once been relegated and been first':

>>> list(filter_by(data, Relegated=1, First=1))
['Manchester City']

You can see that this is not very easily understandable. So one should take extra step to think through what and how to achieve desired result in easy to understand way.

mrsenorchuck · Nov-25-2019, 02:58 PM

(Nov-25-2019, 10:51 AM)perfringo Wrote: You should modify filter provided by Dead_EyE according to your needs.

Thank you for your reply.

So Dead_EyE's code can be used to filter down the records in a data dictionary?

How can I apply this to my code below?

Is it possible to sum a value like 'Win' or would I have to add all win rows to list to do this?

premier = {} 

print()

# open the file
with open(r"Historic_PL.csv") as data_file:
    # read in the first line containing the headers
    headers = data_file.readline()
    
    # for each other line in the file
    for line in data_file:
        # split each line into components (remove white space from ends of line)
        Pos,Club,Seasons,Pld,Win,Draw,Loss,GF,GA,GD,Pts,First,Second,Third,Fourth,Relegated,Best = line.strip().split(",")

        # insert the data into the dictionary
        premier[int(Pos)] = (Club,int(Seasons),int(Pld),int(Win),int(Draw),int(Loss),int(GF),int(GA),int(GD),int(Pts),int(First),int(Second),int(Third),int(Fourth),int(Relegated),int(Best))

Pos,Club,Seasons,Pld,Win,Draw,Loss,GF,GA,GD,Pts,First,Second,Third,Fourth,Relegated,Best
1,Manchester United,27,1038,648,224,166,1989,929,1060,2168,13,6,3,1,0,1
2,Arsenal,27,1038,565,260,213,1845,1013,832,1955,3,6,5,7,0,1
3,Chelsea,27,1038,558,257,223,1770,1002,768,1931,5,4,5,2,0,1
4,Liverpool,27,1038,529,262,247,1774,1046,728,1849,0,4,5,7,0,2
5,Tottenham Hotspur,27,1038,446,257,335,1547,1306,241,1595,0,1,2,3,0,2

DeaD_EyE · (This post was last modified: Nov-25-2019, 04:07 PM by DeaD_EyE.)

Quote:What is kwargs in your snippet?

kwargs is just a name used by convention for keyword-arguments.
The ** in front the name is important.
Then all keyword-arguments, which are left over, are mapped into a dict.

def foo(**kwargs):
    print(kwargs)


foo(name='Bar', age=1, color='green')

Output:
{'name': 'Bar', 'age': 1, 'color': 'green'}

Since Python 3.8 we have:

positional only arguments (this is new)
arguments
keyword-arguments

mrsenorchuck · Nov-25-2019, 04:14 PM

(Nov-25-2019, 04:07 PM)DeaD_EyE Wrote:
Quote:What is kwargs in your snippet?

kwargs is just a name used by convention for keyword-arguments.
The ** in front the name is important.
Then all keyword-arguments, which are left over, are mapped into a dict.
def foo(**kwargs):
    print(kwargs)


foo(name='Bar', age=1, color='green')
Output:
{'name': 'Bar', 'age': 1, 'color': 'green'}
Since Python 3.8 we have:
positional only arguments (this is new)

arguments

keyword-arguments

Thank you for your reply, I am on python 3.7. Isn't python 3.8 in beta?

I need to work with 3.7 due to a restriction on the device I am working on.

mrsenorchuck · Nov-25-2019, 09:29 PM

See in your example below, are you defining a function "def_filter" that uses the data dictionary and then passing the team in position 4 to it to return the values for the team in position four?

(Nov-25-2019, 08:26 AM)DeaD_EyE Wrote: You can make a filter-function.
Instead of jumping directly into pandas, you should know also the Python stuff.
def filter_by(data, **kwargs):
    for row in data:
         for key, value in kwargs.items():
             if row.get(key) != value:
                 break
         else:
             yield row


# if the results are saved in the list result
list(filter_by(result, Pos='4'))
In this example I do not type conversion. It's just a string comparison of equality.
The else-block of the for-loop is only then executed, if the for-loop has finished the iteration.
Breaking early out of the loop, will not execute the else-block of the for-loop.
This means only if all keys are existing and return the wanted values, it will yield a hit.
The use of dict.get() is mandatory, if you don't check if the wanted key exist.
Otherwise you'll get a KeyError.
The get method on a dict, return by default None, if the key does not exist.
You can define the default values as second argument in the get method.

The function itself return a generator, if called. The generator does nothing until
it's consumed by a for-loop or types like tuple, list, set, etc...
If you see in a function a yield statement, then it's a generator.

Generators can do funny stuff, like generating infinite sequences.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How can I save Python dictionaries in Matlab?	jlostinco	1	3,581	Jul-04-2019, 11:35 PM Last Post: scidam
	creating an 'adress book' in python using dictionaries?	apollo	6	17,477	May-06-2019, 12:03 PM Last Post: snippsat

Data Dictionaries in Python

User Panel Messages

Announcements