python range function and data aggregation

stullis · Dec-27-2018, 09:59 PM

You're getting 2009 only because of your loop from lines 6 through 9. The script is opening each file and overwriting the variable frame each time. So, at the end of the loop, frame only has the data from the most recent file - which is 2009.

Also, it's stopping at 2009 because range ends when the index equals the upper limit. So, you're getting 1880 through 2009 instead of through 2010. The upper limit needs to be increased by one to include that value.

Are you sure you're using 2.7? Print was a command in 2.7 and didn't take parenthesis.

This could correct your issues:

import pandas as pd

names1880 = pd.read_csv('C:/names/yob1880.txt', names=['name', 'sex', 'births'])
group = names1880.groupby('sex').births.sum()
pieces = []

for year in range(1880,2011):
    columns = ['name', 'sex', 'births']
    path = 'C:/names/yob%d.txt' % year
    frame = pd.read_csv(path, names=columns)
    frame['year'] = year
    pieces.append(frame)

print(pieces)
names = pd.concat(pieces, ignore_index=False)
total_births = names.pivot_table('births', index='year', columns='sex', aggfunc=sum)
print (total_births)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Better visualisation for wide range data	oraib	0	1,394	Oct-16-2021, 10:07 AM Last Post: oraib
	Cycle through Numpy range within another range(?)	Zero01	0	2,075	Jul-31-2020, 02:37 PM Last Post: Zero01
	Pandas - Dynamic column aggregation based on another column	theroadbacktonature	0	3,127	Apr-17-2020, 04:54 PM Last Post: theroadbacktonature

python range function and data aggregation

User Panel Messages

Announcements