Python Forum
Why does newly-formed dict only consist of last row of each year?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Why does newly-formed dict only consist of last row of each year?
#1
Hi all,

I'm trying to convert two columns ('BIRTH_YEAR', 'NAME') of baby_names into a dictionary. Why does the newly-formed dictionary only consist of the last df row of each year?

baby_name_dict = {}
print(baby_names.info(), '\n')
baby_name_dict = dict(zip(baby_names.BIRTH_YEAR, baby_names.NAME))
print(f'baby_name_dict is:  {baby_name_dict}.')
Output:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 13962 entries, 0 to 13961 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 BIRTH_YEAR 13962 non-null int64 1 GENDER 13962 non-null object 2 ETHNICTY 13962 non-null object 3 NAME 13962 non-null object 4 COUNT 13962 non-null int64 5 RANK 13962 non-null int64 dtypes: int64(3), object(3) memory usage: 654.6+ KB None baby_name_dict is: {2011: 'ZEV', 2012: 'ZEV', 2013: 'Zev', 2014: 'Zev'}.
Getting myself ready for a really foolish oversight... :)
Reply
#2
keys are unique and last seen [value, i.e. name] wins.
Mark17 likes this post
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
What are you trying to do, count baby names by year? You could group your dataframe by (groupby) year and baby name and count the number of babies in each group.
Mark17 likes this post
Reply
#4
(Nov-13-2023, 09:31 PM)deanhystad Wrote: What are you trying to do, count baby names by year? You could group your dataframe by (groupby) year and baby name and count the number of babies in each group.

This is a good exercise... I'll work on a .groupby solution.

I'm trying to get a frequency count.
Reply
#5
(Nov-15-2023, 08:21 PM)Mark17 Wrote:
(Nov-13-2023, 09:31 PM)deanhystad Wrote: What are you trying to do, count baby names by year? You could group your dataframe by (groupby) year and baby name and count the number of babies in each group.

This is a good exercise... I'll work on a .groupby solution.

I'm trying to get a frequency count.

.value_counts() is a start:
values = baby_names['NAME'].value_counts()
print(values)
This gets me a list (actually a series) of name frequencies. I see I can also tack on .to_dict() and get a dictionary... but the names are keys and the frequencies are values. What I really want are the frequencies as keys and lists of names as values (since multiple names often occur with the same frequency)--and then I'd want to see that for each year.

I next tried a .groupby() solution...
print(baby_names.groupby(['BRITH_YEAR', 'NAME']))
...but I just get a groupby object:
Output:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000018E277041C0>
If I tack on .sum(), then I get this:
Output:
COUNT RANK BRITH_YEAR NAME 2011 AALIYAH 528 140 AARAV 60 204 AARON 1092 516 ABBY 40 312 ABDIEL 48 368 ... ... ... 2014 Zion 40 33 Zissy 25 71 Zoe 240 86 Zoey 116 164 Zuri 21 30 [4882 rows x 2 columns]
This is a bit confusing. 'COUNT' and 'RANK' are the last two column names. The actual number of times 'ABBY' appears is five (four in 2011 and one in 2012), so .sum() isn't counting occurrences of the names:
baby_names[baby_names['NAME'] == 'ABBY']
Output:
BRITH_YEAR GENDER ETHNICTY NAME COUNT RANK 1295 2011 FEMALE HISPANIC ABBY 10 78 2767 2011 FEMALE HISPANIC ABBY 10 78 4267 2011 FEMALE HISPANIC ABBY 10 78 6230 2011 FEMALE HISPANIC ABBY 10 78 7852 2012 FEMALE ASIAN AND PACI ABBY 11 44
Lots of stuff here... I appreciate any light you can shine on these concepts to help me understand them!
Reply
#6
I was thinking of something like this:
from random import randint, choice
import pandas as pd

# Make up some data for processing
baby_names = pd.DataFrame(
    [{"Year": year, "Name": choice('ABCD')} for _ in range(100) for year in range(2000, 2005)]
)
stats = baby_names.groupby(["Year", "Name"]).agg(Count=("Name", "count"))
stats["%"] = 100 * stats["Count"] / stats.groupby("Year")["Count"].transform('sum')
stats.sort_values(by=["Year", "%"], ascending=[True, False], inplace=True)
print(stats)
Reply
#7
(Nov-17-2023, 04:54 AM)deanhystad Wrote: I was thinking of something like this:
from random import randint, choice
import pandas as pd

# Make up some data for processing
baby_names = pd.DataFrame(
    [{"Year": year, "Name": choice('ABCD')} for _ in range(100) for year in range(2000, 2005)]
)
stats = baby_names.groupby(["Year", "Name"]).agg(Count=("Name", "count"))
stats["%"] = 100 * stats["Count"] / stats.groupby("Year")["Count"].transform('sum')
stats.sort_values(by=["Year", "%"], ascending=[True, False], inplace=True)
print(stats)

Interesting. Lots of stuff in there... I will study that and try to apply. Thanks so much!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Trying to get year not the entire year & time mbrown009 2 896 Jan-09-2023, 01:46 PM
Last Post: snippsat
  Sort a dict in dict cherry_cherry 4 75,786 Apr-08-2020, 12:25 PM
Last Post: perfringo
  [gpxpy] "Error parsing XML: not well-formed (invalid token): line 1, column 1" Winfried 5 6,700 Jan-26-2020, 01:09 AM
Last Post: Winfried
  How to show newly added column to csv johnson54937 3 2,218 Jan-07-2020, 04:01 AM
Last Post: Larz60+
  How to eliminate magic squares formed by the same numbers, but permuted frame 7 3,646 May-09-2019, 11:28 AM
Last Post: frame
  FileNotFoundError in newly structured Python project PrateekG 0 2,440 May-23-2018, 06:20 AM
Last Post: PrateekG
  Help needed building newly released FOSS 'Meshroom' mStuff 0 2,582 Apr-29-2018, 10:54 AM
Last Post: mStuff
  Copy folders to newly created folder and append Filthy_McNasty 5 5,077 Feb-21-2017, 05:26 PM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020