Python Forum

Full Version: pivot
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,
could someone help me please with a special programming question:

I have a dataframe like this:

Datum Mitarbeiter Tageszeit
0 01.01.2023 M51 V
10 01.01.2023 M51 N
37 02.01.2023 M53 V
91 02.01.2023 M51 V
506 02.01.2023 M53 N
... ... ... ...
323288 04.12.2024 M53 N
323608 05.12.2024 M53 V
324199 05.12.2024 M34 N
324210 05.12.2024 M53 N
324419 06.12.2024 M53 V

I would need to calculate a table like this

Datum Tageszeit M51 M53 M34
04.12.2024 V 0 0 0
04.12.2024 N 0 X 0

Where X is set if there is an entry in the original table at e.g. Datum 04.12.2024 Tageszeit V for Mitarbeiter M51 or Mitarbeiter M34 and a 0 if there is not.
V means noon, N means afternoon.

How can I achieve this. I was thinking for a pivot table, but I have no idea how to do this.

Regards,
Gunthert
Don't know how I clicked on this old post, but I did.

I tried pandas .pivot() and .pivot_table() without success, probably I did it wrong.

But you can get the data in a basic way by using:

list_df = df.values.tolist()
Now you have a list of lists with all the contents of your df. Lists are fairly easy to manipulate.

from io import StringIO
import pandas as pd

# small data taken from your post
data = StringIO("""Datum Mitarbeiter Tageszeit
01.01.2023 M51 V
01.01.2023 M51 N
02.01.2023 M53 V
02.01.2023 M51 V
02.01.2023 M53 N
04.12.2024 M53 N
05.12.2024 M53 V
05.12.2024 M34 N
05.12.2024 M53 N
06.12.2024 M53 V""")

# read the data as csv although we don't have a csv
df = pd.read_csv(data, sep=" ")

# get a list of unique Mitarbeiterzeichen
mitwirker = list(set(df['Mitarbeiter'].tolist()))
# sort the result, der Schönheit wegen
mitwirker.sort()
# die anderen gewünschten Zeilen im Resultat
extra_zeilen = ['Datum', 'Tageszeit']
df2_columns = extra_zeilen + mitwirker
# mach eine Liste voller Indexnummern
indexnums = [i for i in range(len(df))]
# create a df full of NaN, but with index
df2 = pd.DataFrame(columns=df2_columns, index=indexnums)

# get all values in df as a list of lists
# in this case, each sub-list has 3 elements: Datum, Mitarbeiter, Tageszeit
list_df = df.values.tolist()
# mach ne Kopie von df2 als Arbeitskopie
test = df2.copy()
# spiele die Werte von list_df in test hinein
for i in range(len(list_df)):
    test.loc[i,['Datum']] = list_df[i][0]
    test.loc[i,['Tageszeit']] = list_df[i][2]
    test.loc[i,[list_df[i][1]]] = 'X'

# überschreibe NaN mit Null
resultat = test.fillna(0)
Result:

Output:
Datum Tageszeit M34 M51 M53 0 01.01.2023 V 0 X 0 1 01.01.2023 N 0 X 0 2 02.01.2023 V 0 0 X 3 02.01.2023 V 0 X 0 4 02.01.2023 N 0 0 X 5 04.12.2024 N 0 0 X 6 05.12.2024 V 0 0 X 7 05.12.2024 N X 0 0 8 05.12.2024 N 0 0 X 9 06.12.2024 V 0 0 X
I managed to get pivot_table() to work, you need to get the aggfunc parameter correct. By default it is "mean", but there is no "mean" from V and N.

pivot_df = pd.pivot_table(dp, index=["Datum"], columns=["Mitarbeiter"], values="Tageszeit", aggfunc="sum") # yes
resultat = pivot_df.fillna(0) # ok
Gives:

Output:
Mitarbeiter M34 M51 M53 Datum 01.01.2023 0 VN 0 02.01.2023 0 V VN 04.12.2024 0 0 N 05.12.2024 N 0 VN 06.12.2024 0 0 V
From this we can see, M34 is a lazy ·[expletive deleted]!