Python Forum

Full Version: Median of the age row - tsv file
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Problem: Code a program that opens a tsv file and calculated the mean age of female.

I opened it with this code:
import csv
with open("047.tsv") as tsvfile:
    tsvreader = csv.reader(tsvfile, delimiter="\t")
    for line in tsvreader:
        print (line[1:])
And it prints something like this:

['age', 'gender']
['20', 'female']
['30', 'female']
['21', 'female']
['23', 'female']
['30', 'female']
['25', 'female']
['13', 'female']
['19', 'female']
['16', 'female']
['25', 'female']
['20', 'female']
['25', 'male']
['27', 'male']
['43', 'male']

How do I continue to only get the median of the "age" row?
for number in age:
??

Do I need to import statistics to make it easier to get the mean? (with statistics.mean())
I need to define number and age first too.. I'm not really sure how to start with this
Before your loop you should create an empty list, potentially calling it female_ages. Inside your for-loop, you'll want to check the gender, and if it's female, add the age to your list (it looks like you'll probably need to turn that age into an int / float, before adding it to the list). After the loop, you can use the function you asked about on that list you populated.
I would suggest you to use Pandas for such i/o operations.
Pandas has a lot of helper and customizable functions, e.g. read_csv:

import pandas as pd
data = pd.read_csv('047.tsv', sep='\t')
data.age.median() # compute median of the age column, 
# or data['age'].median(), data['age'].mean() etc. 
Pandas is overkill for this, especially if it's an intro Python course, where they would likely not allow third party libraries. This can be done easily in Python without any imports, or with standard imports that come with Python. (statistics comes standard in some Python 3 version.)