Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Data Dictionaries in Python
#11
You can make a filter-function.
Instead of jumping directly into pandas, you should know also the Python stuff.


def filter_by(data, **kwargs):
    for row in data:
         for key, value in kwargs.items():
             if row.get(key) != value:
                 break
         else:
             yield row


# if the results are saved in the list result
list(filter_by(result, Pos='4'))
In this example I do not type conversion. It's just a string comparison of equality.
The else-block of the for-loop is only then executed, if the for-loop has finished the iteration.
Breaking early out of the loop, will not execute the else-block of the for-loop.
This means only if all keys are existing and return the wanted values, it will yield a hit.
The use of dict.get() is mandatory, if you don't check if the wanted key exist.
Otherwise you'll get a KeyError.
The get method on a dict, return by default None, if the key does not exist.
You can define the default values as second argument in the get method.

The function itself return a generator, if called. The generator does nothing until
it's consumed by a for-loop or types like tuple, list, set, etc...
If you see in a function a yield statement, then it's a generator.

Generators can do funny stuff, like generating infinite sequences.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#12
(Nov-25-2019, 08:26 AM)DeaD_EyE Wrote: You can make a filter-function.
Instead of jumping directly into pandas, you should know also the Python stuff.


def filter_by(data, **kwargs):
    for row in data:
         for key, value in kwargs.items():
             if row.get(key) != value:
                 break
         else:
             yield row


# if the results are saved in the list result
list(filter_by(result, Pos='4'))
In this example I do not type conversion. It's just a string comparison of equality.
The else-block of the for-loop is only then executed, if the for-loop has finished the iteration.
Breaking early out of the loop, will not execute the else-block of the for-loop.
This means only if all keys are existing and return the wanted values, it will yield a hit.
The use of dict.get() is mandatory, if you don't check if the wanted key exist.
Otherwise you'll get a KeyError.
The get method on a dict, return by default None, if the key does not exist.
You can define the default values as second argument in the get method.

The function itself return a generator, if called. The generator does nothing until
it's consumed by a for-loop or types like tuple, list, set, etc...
If you see in a function a yield statement, then it's a generator.

Generators can do funny stuff, like generating infinite sequences.

Thank you for this detail.

So like you say it is better to understand dictionaries, lists etc before jumping right in to pandas and that is what I am trying to achieve.

I have two items in my key which are all time league postition and club.

What is kwargs in your snippet?

I am assuming the 'value' is a string value you are passing to the filter?

Can you only filter on key values? What if I wanted to filter on teams that have been relegated but won the league like Leicester?

Regards,
Aidan.
Reply
#13
You should modify filter provided by Dead_EyE according to your needs.

One way to achieve result is convert value to int and make comparison instead. Following function filters out values which are less than in arguments given and yields only club name, not whole row:

def filter_by(data, **kwargs):
    for row in data:
         for key, value in kwargs.items():
             if int(row.get(key)) < value:
                 break
         else:
             yield row['Club']
If not relegated (Relegated < 1) and not been first (First < 1) will break: 'filter out teams which have at least once been relegated and been first':

>>> list(filter_by(data, Relegated=1, First=1))
['Manchester City']
You can see that this is not very easily understandable. So one should take extra step to think through what and how to achieve desired result in easy to understand way.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#14
(Nov-25-2019, 10:51 AM)perfringo Wrote: You should modify filter provided by Dead_EyE according to your needs.

Thank you for your reply.

So Dead_EyE's code can be used to filter down the records in a data dictionary?

How can I apply this to my code below?

Is it possible to sum a value like 'Win' or would I have to add all win rows to list to do this?

premier = {} 

print()

# open the file
with open(r"Historic_PL.csv") as data_file:
    # read in the first line containing the headers
    headers = data_file.readline()
    
    # for each other line in the file
    for line in data_file:
        # split each line into components (remove white space from ends of line)
        Pos,Club,Seasons,Pld,Win,Draw,Loss,GF,GA,GD,Pts,First,Second,Third,Fourth,Relegated,Best = line.strip().split(",")

        # insert the data into the dictionary
        premier[int(Pos)] = (Club,int(Seasons),int(Pld),int(Win),int(Draw),int(Loss),int(GF),int(GA),int(GD),int(Pts),int(First),int(Second),int(Third),int(Fourth),int(Relegated),int(Best))
Pos,Club,Seasons,Pld,Win,Draw,Loss,GF,GA,GD,Pts,First,Second,Third,Fourth,Relegated,Best
1,Manchester United,27,1038,648,224,166,1989,929,1060,2168,13,6,3,1,0,1
2,Arsenal,27,1038,565,260,213,1845,1013,832,1955,3,6,5,7,0,1
3,Chelsea,27,1038,558,257,223,1770,1002,768,1931,5,4,5,2,0,1
4,Liverpool,27,1038,529,262,247,1774,1046,728,1849,0,4,5,7,0,2
5,Tottenham Hotspur,27,1038,446,257,335,1547,1306,241,1595,0,1,2,3,0,2
Reply
#15
Quote:What is kwargs in your snippet?

kwargs is just a name used by convention for keyword-arguments.
The ** in front the name is important.
Then all keyword-arguments, which are left over, are mapped into a dict.

def foo(**kwargs):
    print(kwargs)


foo(name='Bar', age=1, color='green')
Output:
{'name': 'Bar', 'age': 1, 'color': 'green'}
Since Python 3.8 we have:
  • positional only arguments (this is new)
  • arguments
  • keyword-arguments
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#16
(Nov-25-2019, 04:07 PM)DeaD_EyE Wrote:
Quote:What is kwargs in your snippet?

kwargs is just a name used by convention for keyword-arguments.
The ** in front the name is important.
Then all keyword-arguments, which are left over, are mapped into a dict.

def foo(**kwargs):
    print(kwargs)


foo(name='Bar', age=1, color='green')
Output:
{'name': 'Bar', 'age': 1, 'color': 'green'}
Since Python 3.8 we have:
  • positional only arguments (this is new)
  • arguments
  • keyword-arguments

Thank you for your reply, I am on python 3.7. Isn't python 3.8 in beta?

I need to work with 3.7 due to a restriction on the device I am working on.
Reply
#17
See in your example below, are you defining a function "def_filter" that uses the data dictionary and then passing the team in position 4 to it to return the values for the team in position four?
(Nov-25-2019, 08:26 AM)DeaD_EyE Wrote: You can make a filter-function.
Instead of jumping directly into pandas, you should know also the Python stuff.


def filter_by(data, **kwargs):
    for row in data:
         for key, value in kwargs.items():
             if row.get(key) != value:
                 break
         else:
             yield row


# if the results are saved in the list result
list(filter_by(result, Pos='4'))
In this example I do not type conversion. It's just a string comparison of equality.
The else-block of the for-loop is only then executed, if the for-loop has finished the iteration.
Breaking early out of the loop, will not execute the else-block of the for-loop.
This means only if all keys are existing and return the wanted values, it will yield a hit.
The use of dict.get() is mandatory, if you don't check if the wanted key exist.
Otherwise you'll get a KeyError.
The get method on a dict, return by default None, if the key does not exist.
You can define the default values as second argument in the get method.

The function itself return a generator, if called. The generator does nothing until
it's consumed by a for-loop or types like tuple, list, set, etc...
If you see in a function a yield statement, then it's a generator.

Generators can do funny stuff, like generating infinite sequences.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How can I save Python dictionaries in Matlab? jlostinco 1 2,812 Jul-04-2019, 11:35 PM
Last Post: scidam
  creating an 'adress book' in python using dictionaries? apollo 6 14,784 May-06-2019, 12:03 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020