Python Forum
python range function and data aggregation
Thread Rating:
  • 3 Vote(s) - 2.33 Average
  • 1
  • 2
  • 3
  • 4
  • 5
python range function and data aggregation
#1
Dear friends I am a experienced business analyst having worked on excel ad-dons/rapid-miner / qlikview for a number of years , but new to python coding

i am having problems with the following code

-----------------------------------------------------------------------------------------------

import pandas as pd
names1880 = pd.read_csv('C:/names/yob1880.txt', names=['name', 'sex', 'births'])
group = names1880.groupby('sex').births.sum()
pieces = []
years = range(1880,2010)
for year in years:
    columns = ['name', 'sex', 'births']
    path = 'C:/names/yob%d.txt' % year
    frame = pd.read_csv(path, names=columns)
frame['year'] = year
pieces.append(frame)
print(pieces)
names = pd.concat(pieces, ignore_index=False)
total_births = names.pivot_table('births', index='year', columns='sex', aggfunc=sum)
print (total_births)
---------------------------------------------------------------------------------------------

i am getting aggregated output for the year 2009, however I need aggregated output for all years between years 1880 to 2010

Using python 2.7
Reply
#2
You're getting 2009 only because of your loop from lines 6 through 9. The script is opening each file and overwriting the variable frame each time. So, at the end of the loop, frame only has the data from the most recent file - which is 2009.

Also, it's stopping at 2009 because range ends when the index equals the upper limit. So, you're getting 1880 through 2009 instead of through 2010. The upper limit needs to be increased by one to include that value.

Are you sure you're using 2.7? Print was a command in 2.7 and didn't take parenthesis.

This could correct your issues:

import pandas as pd

names1880 = pd.read_csv('C:/names/yob1880.txt', names=['name', 'sex', 'births'])
group = names1880.groupby('sex').births.sum()
pieces = []

for year in range(1880,2011):
    columns = ['name', 'sex', 'births']
    path = 'C:/names/yob%d.txt' % year
    frame = pd.read_csv(path, names=columns)
    frame['year'] = year
    pieces.append(frame)

print(pieces)
names = pd.concat(pieces, ignore_index=False)
total_births = names.pivot_table('births', index='year', columns='sex', aggfunc=sum)
print (total_births)
Reply
#3
Many thanks
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Better visualisation for wide range data oraib 0 1,289 Oct-16-2021, 10:07 AM
Last Post: oraib
  Cycle through Numpy range within another range(?) Zero01 0 1,993 Jul-31-2020, 02:37 PM
Last Post: Zero01
  Pandas - Dynamic column aggregation based on another column theroadbacktonature 0 3,011 Apr-17-2020, 04:54 PM
Last Post: theroadbacktonature

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020