Python Forum

Hello,
could someone help me please with a special programming question:

I have a dataframe like this:

Datum Mitarbeiter Tageszeit
0 01.01.2023 M51 V
10 01.01.2023 M51 N
37 02.01.2023 M53 V
91 02.01.2023 M51 V
506 02.01.2023 M53 N
... ... ... ...
323288 04.12.2024 M53 N
323608 05.12.2024 M53 V
324199 05.12.2024 M34 N
324210 05.12.2024 M53 N
324419 06.12.2024 M53 V

I would need to calculate a table like this

Datum Tageszeit M51 M53 M34
04.12.2024 V 0 0 0
04.12.2024 N 0 X 0

Where X is set if there is an entry in the original table at e.g. Datum 04.12.2024 Tageszeit V for Mitarbeiter M51 or Mitarbeiter M34 and a 0 if there is not.
V means noon, N means afternoon.

How can I achieve this. I was thinking for a pivot table, but I have no idea how to do this.

Regards,
Gunthert

Don't know how I clicked on this old post, but I did.

I tried pandas .pivot() and .pivot_table() without success, probably I did it wrong.

But you can get the data in a basic way by using:

list_df = df.values.tolist()

Now you have a list of lists with all the contents of your df. Lists are fairly easy to manipulate.

from io import StringIO
import pandas as pd

# small data taken from your post
data = StringIO("""Datum Mitarbeiter Tageszeit
01.01.2023 M51 V
01.01.2023 M51 N
02.01.2023 M53 V
02.01.2023 M51 V
02.01.2023 M53 N
04.12.2024 M53 N
05.12.2024 M53 V
05.12.2024 M34 N
05.12.2024 M53 N
06.12.2024 M53 V""")

# read the data as csv although we don't have a csv
df = pd.read_csv(data, sep=" ")

# get a list of unique Mitarbeiterzeichen
mitwirker = list(set(df['Mitarbeiter'].tolist()))
# sort the result, der Schönheit wegen
mitwirker.sort()
# die anderen gewünschten Zeilen im Resultat
extra_zeilen = ['Datum', 'Tageszeit']
df2_columns = extra_zeilen + mitwirker
# mach eine Liste voller Indexnummern
indexnums = [i for i in range(len(df))]
# create a df full of NaN, but with index
df2 = pd.DataFrame(columns=df2_columns, index=indexnums)

# get all values in df as a list of lists
# in this case, each sub-list has 3 elements: Datum, Mitarbeiter, Tageszeit
list_df = df.values.tolist()
# mach ne Kopie von df2 als Arbeitskopie
test = df2.copy()
# spiele die Werte von list_df in test hinein
for i in range(len(list_df)):
    test.loc[i,['Datum']] = list_df[i][0]
    test.loc[i,['Tageszeit']] = list_df[i][2]
    test.loc[i,[list_df[i][1]]] = 'X'

# überschreibe NaN mit Null
resultat = test.fillna(0)

Result:

Output:        Datum Tageszeit M34 M51 M53
0  01.01.2023         V   0   X   0
1  01.01.2023         N   0   X   0
2  02.01.2023         V   0   0   X
3  02.01.2023         V   0   X   0
4  02.01.2023         N   0   0   X
5  04.12.2024         N   0   0   X
6  05.12.2024         V   0   0   X
7  05.12.2024         N   X   0   0
8  05.12.2024         N   0   0   X
9  06.12.2024         V   0   0   X

I managed to get pivot_table() to work, you need to get the aggfunc parameter correct. By default it is "mean", but there is no "mean" from V and N.

pivot_df = pd.pivot_table(dp, index=["Datum"], columns=["Mitarbeiter"], values="Tageszeit", aggfunc="sum") # yes
resultat = pivot_df.fillna(0) # ok

Gives:

Output:Mitarbeiter M34 M51 M53
Datum                  
01.01.2023    0  VN   0
02.01.2023    0   V  VN
04.12.2024    0   0   N
05.12.2024    N   0  VN
06.12.2024    0   0   V

From this we can see, M34 is a lazy ·[expletive deleted]!

Python Forum

gunther

Pedroski55