Nov-16-2023, 07:07 PM
(Nov-15-2023, 08:21 PM)Mark17 Wrote:(Nov-13-2023, 09:31 PM)deanhystad Wrote: What are you trying to do, count baby names by year? You could group your dataframe by (groupby) year and baby name and count the number of babies in each group.
This is a good exercise... I'll work on a .groupby solution.
I'm trying to get a frequency count.
.value_counts() is a start:
values = baby_names['NAME'].value_counts() print(values)This gets me a list (actually a series) of name frequencies. I see I can also tack on .to_dict() and get a dictionary... but the names are keys and the frequencies are values. What I really want are the frequencies as keys and lists of names as values (since multiple names often occur with the same frequency)--and then I'd want to see that for each year.
I next tried a .groupby() solution...
print(baby_names.groupby(['BRITH_YEAR', 'NAME']))...but I just get a groupby object:
Output:<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000018E277041C0>
If I tack on .sum(), then I get this:Output: COUNT RANK
BRITH_YEAR NAME
2011 AALIYAH 528 140
AARAV 60 204
AARON 1092 516
ABBY 40 312
ABDIEL 48 368
... ... ...
2014 Zion 40 33
Zissy 25 71
Zoe 240 86
Zoey 116 164
Zuri 21 30
[4882 rows x 2 columns]
This is a bit confusing. 'COUNT' and 'RANK' are the last two column names. The actual number of times 'ABBY' appears is five (four in 2011 and one in 2012), so .sum() isn't counting occurrences of the names:baby_names[baby_names['NAME'] == 'ABBY']
Output: BRITH_YEAR GENDER ETHNICTY NAME COUNT RANK
1295 2011 FEMALE HISPANIC ABBY 10 78
2767 2011 FEMALE HISPANIC ABBY 10 78
4267 2011 FEMALE HISPANIC ABBY 10 78
6230 2011 FEMALE HISPANIC ABBY 10 78
7852 2012 FEMALE ASIAN AND PACI ABBY 11 44
Lots of stuff here... I appreciate any light you can shine on these concepts to help me understand them!