Is there a better data structure than classes for a set of employes?

Is there a better data structure than classes for a set of employes? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Is there a better data structure than classes for a set of employes? (/thread-24664.html)

Is there a better data structure than classes for a set of employes? - Schlangenversteher - Feb-26-2020

Hello,

i have around 100 uniform data sets of employes that i would like to read and process in my code. Most of the data, like name or nationality can be used directly, other parameters like their actual cost for the company need to be derived from the data.
In the beginning, i used a dictionary of dictionary to deal with the data set. This turned out to be unreadable, awful code. Therefore i used a .yml file for storing the data and wrote a script that reads the .yml, creates a class object for every employe and passes a list of emplye objects to the actual code.
While this works out for me, it still looks kinda strange and off at times.
Is there a better way to store and process those data? I was thinking about using Sql lite or some data structure that panda offers.

RE: Is there a better data structure than classes for a set of employes? - Larz60+ - Feb-26-2020

It's hard to understand exactly what you are talking about without a sample of data.
A dictionary is a very good way to store structured data.
Show an example of your data as a dictionary and also as yaml.

RE: Is there a better data structure than classes for a set of employes? - Schlangenversteher - Feb-26-2020

Example with dictionaries:

employeList = {
    {
       "name": "susan",
       "nationality": "netherland",
       "overallCost": None
    },
    ...
}

# resolve overall cost for susan
employeList[0]["overallCost"] = resolveOverallCost(employeList[0])

workWithEmployeList()

Example with yaml:

# employe.yml
- name: susan
  nationality: netherland
...

# python script

class employeObject:
    overallCost = None
    def __init__(self, name, nationality):
        self.name = name
        self.nationarlity = nationality
        self.resolveOverallCost()

    def resolveOverallCost(self):
        ...
        self.overallCost = calculatedValue

open(yamlfile):
    employeList = readYamlFile()

for employe in employeList:
    emloyeObjectList.append(employeObject(employe["name], emplaye["nationality"])

workWithEmployeObjectList()

RE: Is there a better data structure than classes for a set of employes? - buran - Feb-26-2020

classes look just fine (given that you also want to have derived properties/attributes). Of course as Larz said built-in data structure like dict or named tuple can also be used, but this looks like nice use case for custom class.
You can write the whole class from scratch or to make it easier, you can have look at @dataclass that will help create __init__ and some other dunder methods for your class.
We have a nice tutorial by @snippsat

If you show your code as well as some sample data and what the derived data would look like we can help with further guidance
EDIT: you did post your code while I was answering

RE: Is there a better data structure than classes for a set of employes? - buran - Feb-26-2020

Where do you get the data to calculate cost?
Do you have extra fields per employee in the yaml or just the name and nationality?
Also you can make the cost property (using @property decorator), instead of having overallcost property and resolveOverallCost method

RE: Is there a better data structure than classes for a set of employes? - buran - Feb-26-2020

employees.yaml

Output:  - name: John
    nationality: USA
  - name: Jane
    nationality: UK

import yaml
from random import randint
from dataclasses import dataclass

# one way to define basic class
class Employee:
    def __init__(self, name, nationality):
        self.name = name
        self.nationality =  nationality

    @property
    def cost(self):
        some_calculated_cost = randint(0, 20) # here I just randomly genereate cost between 0 and 20
        return some_calculated_cost

# alternative, using @dataclass
@dataclass
class Employee2:
    name: str
    nationality: str


    @property
    def cost(self):
        some_calculated_cost = randint(0, 20) # here I just randomly genereate cost between 0 and 20
        return some_calculated_cost



if __name__ == '__main__':
    
    # load using Employee class
    with open('employees.yaml') as f:
        employees = [Employee(**empl) for empl in yaml.safe_load(f)]

    # load using Employee2 class
    with open('employees.yaml') as f:
        employees2 = [Employee2(**empl) for empl in yaml.safe_load(f)]

    print(employees) # this one has no __str__ or __repr__ method defined
    print(employees2) # note the difference, this one has __repr__() method autocreated

    for employee in employees:
        print(f'{employee.name}: {employee.cost}')

    for employee in employees:
        print(f'{employee.name}: {employee.cost}')