Python Forum

Full Version: Separating Names & Counting
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,
I'm doing a Uni course & I'm a bit stuck & would appreciate some help please.

I have a column in a data frame that contains data as per below (no spacing)
ATTENDEES
John,Jan,Paul
Kylie,Paul,Scott,Jason
Jan,Scott,John

I' trying to seperate the data so I can show who is attending in a list like
ATTENDEES
John
Jan
Paul
Kylie
Scott
Jason

Then be able to count how many are attending
ATTENDEES
John 1
Jan 2
Paul 2
Kylie 1
Scott 2
Jason 1

I've tried splitting the names using
names=masterlist['attendees'].str.split(pat = ',', expand = True)

and counting by using
CountNames=names.groupby(0).count()

but it does not include all the data only the 1st new column from the split
and counting is across all columns.

Thanks in advance
Steve
Please show what you have tried, working or not.
(Nov-03-2022, 07:32 AM)Larz60+ Wrote: [ -> ]Please show what you have tried, working or not.

import pandas as pd
import os
party = pd.read_csv('party.csv')
CountPeople=party['ATTENDEES'].str.split(pat = ',', expand = True)
CountPeople=CountPeople.groupby(0).count()
party
CountPeople
I've also included an attachment, so you can see the outputs.
Post deleted. It's in homework. Providing the solution is not so good.
OK, so on further reading I've found this post.

how often does a word occur in this column?
https://python-forum.io/thread-10857.htm...ght=genres

If I copy & paste my data into python as per the example it works, but when I try & read it from the CSV file it does not work.

import pandas as pd
import os
party = pd.read_csv('party.csv')
data = party['ATTENDEES']
data_list = data.replace('\n', ',')
data_list = data_list.strip().split(',')
print(Counter(data_list).most_common(15))
Could someone please explain why when I use this it works
data = 'John,Jan,Paul,Kylie,Paul,Scott,Jason,Jan,Scott,John'
BUT
party = pd.read_csv('party.csv')
data = party['ATTENDEES']
Does not work
I'd assume it has something to do with multiple rows or not adding a comma after each row..
You are trying to treat data_list like a str. It is not a str, it is a series.

Maybe you could join the ATTENDEES together and make a str.
It worked, thanks for the guidance.
Was there another way I could have done this?

import pandas as pd
import os
party = pd.read_csv('party.csv')
data = party['ATTENDEES']
list = data.str.cat(sep=',').strip().split(',')
print(Counter(list).most_common())
Output:
[('John', 2), ('Jan', 2), ('Paul', 2), ('Scott', 2), ('Kylie', 1), ('Jason', 1)]

Also is there anyway of listing the names up/down rather than left/right..
Thanks
I was thinking of using str.join().
import pandas as pd
df = pd.read_csv("party.csv")
names = ",".join(df["ATTENDEES"]).split(",")
print(names)
Output:
['John', 'Jan', 'Paul', 'Kylie', 'Paul', 'Scott', 'Jason', 'Jan', 'Scott', 'John']
Quote:Also is there anyway of listing the names up/down rather than left/right.
Of course, but you'll need to control the printing yourself instead of using the default str representation of a list.

You could use join().
print("\n".join(names))
Output:
John Jan Paul Kylie Paul Scott Jason Jan Scott John
Or you could print a name at a time in a for loop.