Python Forum
'Age' categorical (years -months -days ) to numeric - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: 'Age' categorical (years -months -days ) to numeric (/thread-21761.html)



'Age' categorical (years -months -days ) to numeric - Smiling29 - Oct-13-2019

I have a dataset with Age column which has data as follows:

df_s7['Age'].unique()
array(['28 Years', '10 Month(s) 15 Day(s)', '46 Years', '65 Years', '45 Years', '30 Years', '47 Years', '17 Years', '55 Years', '50 Years', '39 Years', '42 Years', '38 Years', '40 Years', '20 Years', ' < 1 Year', '29 Years', '43 Years', '31 Years', '36 Years', '11 Years', '48 Years', '23 Years', '25 Years', '32 Years', '82 Years', '44 Years', '37 Years', '52 Years', '35 Years', '18 Years', '19 Years', '49 Years', '62 Years', '51 Years', '72 Years', '26 Years', '54 Years', '24 Years', '59 Years', '34 Years', '53 Years', '14 Years', '71 Years', '27 Years', '66 Years', '33 Years', '22 Years', '70 Years', '60 Years', '21 Years', '3 Month(s) 11 Day(s)', '58 Years', '56 Years', '63 Years', '5 Years', '64 Years', '10 Years', '16 Years', '15 Years', '75 Years', '57 Years', '2 Years', '83 Years', '77 Years', '74 Years', '13 Years', '41 Years', '69 Years', '1 Month(s) 29 Day(s)', '8 Years', '7 Month(s) 16 Day(s)', '61 Years', '67 Years', '1 Month(s) 30 Day(s)', '84 Years', '1 Month(s) 12 Day(s)', '6 Month(s) 26 Day(s)', '12 Years', '5 Month(s) 18 Day(s)', '68 Years', '80 Years', '3 Month(s) 19 Day(s)', '76 Years', '86 Years', '7 Month(s) 2 Day(s)', '1 Years', '73 Years', '90 Years', '6 Month(s) 20 Day(s)', '79 Years', '89 Years', '9 Years', '3 Month(s) 29 Day(s)', '8 Month(s) 21 Day(s)', '4 Years', '6 Month(s) 8 Day(s)', '78 Years', '6 Years', '87 Years', '7 Years', '6 Month(s) 9 Day(s)', '4 Month(s) 20 Day(s)', '10 Month(s) 16 Day(s)', '4 Month(s) 11 Day(s)', '6 Month(s) 18 Day(s)', '4 Month(s) 13 Day(s)'], dtype=object)

I want to create groups and then visualize histogram groups something like below:

def age_buckets(x):
    if x < 1:
        return '0-1'
    if x < 17:
        return '1-17'
    if x < 30:
        return '18-29'
    elif x < 40:
        return '30-39'
    elif x < 50:
        return '40-49'
    elif x < 60:
        return '50-59'
    elif x < 70:
        return '60-69'
    elif x >=70:
        return '70+'
    else:
        return 'other'
There is also a "Sex" column- Male, Female, Transgender 1) I want to plot(1D) histogram only on Age col based different age groups and color code 2) plot based on Age & Sex column for different age groups and color code

Please advise


RE: 'Age' categorical (years -months -days ) to numeric - Smiling29 - Oct-14-2019

Additional details:
What I expect in the output:
Output:
Age(years) Age_group 1 0-1 0.2 0-1 16 1-17
or
Output:
Age(years) Age_group 1 Infant 0 Infant 16 Teen
Which ever is a good approach to plot. Please advise


RE: 'Age' categorical (years -months -days ) to numeric - Smiling29 - Oct-16-2019

I was thinking to split Age column to 2 different columns to separate years, months , days but Iam not able to identify which one to use for split. Can someone please help on this


RE: 'Age' categorical (years -months -days ) to numeric - perfringo - Oct-17-2019

If you need years then simplest way is:

In [1]: lst = ['28 Years', '10 Month(s) 15 Day(s)', '46 Years', '65 Years', '45 Years', '30 Year
   ...: s', '47 Years', '17 Years', '55 Years', '50 Years', '39 Years', '42 Years', '38 Years', 
   ...: '40 Years', '20 Years', ' < 1 Year', '29 Years', '43 Years', '31 Years', '36 Years', '11
   ...:  Years', '48 Years', '23 Years', '25 Years', '32 Years', '82 Years', '44 Years', '37 Yea
   ...: rs', '52 Years', '35 Years', '18 Years', '19 Years', '49 Years', '62 Years', '51 Years',
   ...:  '72 Years', '26 Years', '54 Years', '24 Years', '59 Years', '34 Years', '53 Years', '14
   ...:  Years', '71 Years', '27 Years', '66 Years', '33 Years', '22 Years', '70 Years', '60 Yea
   ...: rs', '21 Years', '3 Month(s) 11 Day(s)', '58 Years', '56 Years', '63 Years', '5 Years', 
   ...: '64 Years', '10 Years', '16 Years', '15 Years', '75 Years', '57 Years', '2 Years', '83 Y
   ...: ears', '77 Years', '74 Years', '13 Years', '41 Years', '69 Years', '1 Month(s) 29 Day(s)
   ...: ', '8 Years', '7 Month(s) 16 Day(s)', '61 Years', '67 Years', '1 Month(s) 30 Day(s)', '8
   ...: 4 Years', '1 Month(s) 12 Day(s)', '6 Month(s) 26 Day(s)', '12 Years', '5 Month(s) 18 Day
   ...: (s)', '68 Years', '80 Years', '3 Month(s) 19 Day(s)', '76 Years', '86 Years', '7 Month(s
   ...: ) 2 Day(s)', '1 Years', '73 Years', '90 Years', '6 Month(s) 20 Day(s)', '79 Years', '89 
   ...: Years', '9 Years', '3 Month(s) 29 Day(s)', '8 Month(s) 21 Day(s)', '4 Years', '6 Month(s
   ...: ) 8 Day(s)', '78 Years', '6 Years', '87 Years', '7 Years', '6 Month(s) 9 Day(s)', '4 Mon
   ...: th(s) 20 Day(s)', '10 Month(s) 16 Day(s)', '4 Month(s) 11 Day(s)', '6 Month(s) 18 Day(s)
   ...: ', '4 Month(s) 13 Day(s)'] 
   ...:
In [2]: [int(age.split(' Years')[0]) if 'Years' in age else 0 for age in lst] 
Every age which doesn't have year is zero years and from others you take year as int.


RE: 'Age' categorical (years -months -days ) to numeric - Smiling29 - Oct-17-2019

This is great! I am able to get years always but Iam struggling with row values with Months+ Days and only Days now for all those rows which have zeros I want the year instead of months/days.



Example is the row has 4 Months 13 Days => (4/12) + (13/365) = 0.3689497716894977 should be my row value.

I am trying but not able to get get results using function yet.