Python Forum

Full Version: Grouping in pandas/multi-index data frame
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am trying to group this data frame from CSV file. It is a long data frame with multiple countries under the column COUNTRY and corresponding different party names under the column PTYNAME.



Output:
COUNTRY CTRYID YEAR PTYNAME Finland 7.0 2017 Centre Party Finland 7.0 2017 Finns Party Finland 7.0 2017 National Coalition Party Finland 7.0 2017 Social Democratic Party of Finland Finland 7.0 2017 Green League
What I'd like to do is create a multi-index data frame where I have shown different party names under one country name. something like this below:

Output:
COUNTRY PTYNAME Centre Party Finns Party Finland National Coalition Party Social Democratic Party of Finland Green League
I used the method below:

df1 = df.groupby(['COUNTRY'])['PTYNAME'].sum()

but as a result, all party names get packed against each other in a single row.

Was wondering if anyone has any idea. let me know if I need to clarify anything.
You can do something like this:
import pandas as pd

df = pd.read_csv("parties.csv")
df = df.groupby(['COUNTRY', 'PTYNAME'])['YEAR'].count()
print(df)
And get something like this:
Output:
COUNTRY PTYNAME Finland Finns Party 1 National Coalition Party 1 Social Democratic Party of Finland 1 Norway Centre Party 1 Green League 1
But this is no longer a dataframe, it is a multiindexed series.
(Jan-05-2024, 06:43 AM)deanhystad Wrote: [ -> ]You can do something like this:
import pandas as pd

df = pd.read_csv("parties.csv")
df = df.groupby(['COUNTRY', 'PTYNAME'])['YEAR'].count()
print(df)
And get something like this:
Output:
COUNTRY PTYNAME Finland Finns Party 1 National Coalition Party 1 Social Democratic Party of Finland 1 Norway Centre Party 1 Green League 1
But this is no longer a dataframe, it is a multiindexed series.

Thanks so much. I'm pretty new to Python. I am going to use the output with other data frame to match the COUNTRY columns to do certain analyses after. Based on what you said, multi-indexed series will be limited regarding manipulation(let's say selection of columns or rows), is that right? Anyways, again thank you for your answer, I got literally what I needed and had spent more than week to figure it out <3
I think you want to do multi-indexing instead of grouping.
import pandas as pd

df = pd.read_csv("registers.csv")
df = df.set_index(['COUNTRY', 'PTYNAME'])
print(df)
Output:
CTRYID YEAR COUNTRY PTYNAME Norway Centre Party 8.0 2017 Finland Finns Party 7.0 2017 National Coalition Party 7.0 2017 Social Democratic Party of Finland 7.0 2017 Norway Green League 8.0 2017
For a better look, sort the data before making the index.
import pandas as pd

df = pd.read_csv("registers.csv").sort_values(by=["COUNTRY", "PTYNAME"])
df = df.set_index(['COUNTRY', 'PTYNAME'])
print(df)
Output:
CTRYID YEAR COUNTRY PTYNAME Finland Finns Party 7.0 2017 National Coalition Party 7.0 2017 Social Democratic Party of Finland 7.0 2017 Norway Centre Party 8.0 2017 Green League 8.0 2017