find the average data from everyone in the same year

STUdevil · Oct-20-2024, 07:16 PM

have some more problems with this code but i thing if i can overcome this one i can find a way to fix the others as well. have a file with country, year, and life_expectancy and i need to calculate what is the average life_expectancy from all the country put together from a choosen year. so the user types in a year and the programm print him the average number from that year. here is the parth where my code struggles the most.

if choose_year.lower() == year:
     year_expectancy = sum(expectancy) / len(expectancy)

i can not use panda or other libarys and should be only fixed with relly basic coding. thank you all for your help

#Ah example of how the list work
# germany,  ger,  2021,   20.663,
# germany,  ger,  1990,   17.638,
# brasil,   bra,  1999,   22.473,
# brasil,   bra,  2002,   7.982,
# England,  UK,   2021,   9.827,
# england   UK,   1999,   14.672,
# japan,    ja,   2005,   20.661,
# japan,    ja,   2021,   16.836,
# mexico,   mx,   2008,   11.383,
# mexico,   mx,   1999,   26.837,



life_expectancy = open('D://life-expectancy.csv')

with open('D://life-expectancy.csv') as data:
    max_expectancy = 0
    min_expectancy = 99999999
    index = 0
    choose_year = 0
    year_expectancy = 0
    for line in data:
        data = line.strip()
        data = line.split(',')
        entity = data[0]
        code = data[1]
        year = int(data[2])
        expectancy = float(data[3])
         

        if expectancy > max_expectancy:
            max_expectancy = expectancy
            max_country = entity
            max_year = year

        if expectancy < min_expectancy:
            min_expectancy =expectancy
            min_country = entity 
            min_year = year  
            


choose_year = input('Enter the year of interest : ')


# prints allways 0. that calculation does not work
if choose_year.lower() == year:
    year_expectancy = sum(expectancy) / len(expectancy)

print()     
print(f'The overall max life expectancy is: {max_expectancy} from {max_country} in {max_year}')
print(f'The overall min life expectancy is: {min_expectancy} from {min_country} in {min_year}')


       
print()
print(f'For the year {choose_year}:')
# here i need the answer of the calculation put it
print(f'The average life expectancy across all countries was {year_expectancy}')
#     #print(f'The max life expenctancy was in {} with {}')
#     #print(f'The min life expectancy was in {} with{}')

menator01 · Oct-20-2024, 10:14 PM

Maybe something like this. There are better ways.

import csv

file = 'test.csv'

# Create a list
tmplist = []
alist = []
tmp = []

# Get the data from csv file
with open(file, 'r') as data:
    for lines in data.readlines():
        tmplist.append(lines.strip().split(','))

# Do some cleanup of data
for lines in tmplist:
    line = [word.strip() for word in lines]
    alist.append(line)

year = input('Choose Year: ')

for index, item in enumerate(alist):
    if item[2] == year:
        tmp.append(float(item[3]))

if len(tmp) > 0:
    avg = sum(tmp)/len(tmp)

    print(f'Avg. {avg}')
else:
    print(f'There is no data for {year}')

**deanhystad** · Oct-20-2024, 10:14 PM

What have you tried? You should start by collecting your data in a list or dictionary so you can use it after it’s been read.

STUdevil · Oct-21-2024, 04:35 AM

I did Sort them into countrys, abbreviation, years and data
And i can Work with These 4 parts, but now i struggles when. I need the whole Line from all with the Same year

menator01 · Oct-21-2024, 04:40 AM

In my example query the list for the whole line using an if statement.

**perfringo** · Oct-21-2024, 09:35 AM

This task can be broken into steps:

- read data from file
- find max and min values of column and get data from that row
- filter column data and perform calculations.

# data in life-expectancy.csv as this:
germany,ger,2021,20.663
germany,ger,1990,17.638
brasil,bra,1999,22.473
brasil,bra,2002,7.982
England,UK,2021,9.827
england,UK,1999,14.672
japan,ja,2005,20.661
japan,ja,2021,16.836
mexico,mx,2008,11.383
mexico,mx,1999,26.837

# read data from file, convert values and create list of dictionaries:

import csv

with open("life-expectancy.csv", "r", newline="") as csvfile:
    processing = {"Country": str, "Abbreviation": str, "Year": int, "Life expectancy": float}
    reader = csv.DictReader(csvfile, fieldnames=processing)
    data = [{k:processing[k](v) for k, v in line.items()} for line in reader]

# data is:
[{'Country': 'germany', 'Abbreviation': 'ger', 'Year': 2021, 'Life expectancy': 20.663}, 
{'Country': 'germany', 'Abbreviation': 'ger', 'Year': 1990, 'Life expectancy': 17.638}, 
{'Country': 'brasil', 'Abbreviation': 'bra', 'Year': 1999, 'Life expectancy': 22.473}, 
{'Country': 'brasil', 'Abbreviation': 'bra', 'Year': 2002, 'Life expectancy': 7.982}, 
{'Country': 'England', 'Abbreviation': 'UK', 'Year': 2021, 'Life expectancy': 9.827}, 
{'Country': 'england', 'Abbreviation': 'UK', 'Year': 1999, 'Life expectancy': 14.672}, 
{'Country': 'japan', 'Abbreviation': 'ja', 'Year': 2005, 'Life expectancy': 20.661}, 
{'Country': 'japan', 'Abbreviation': 'ja', 'Year': 2021, 'Life expectancy': 16.836}, 
{'Country': 'mexico', 'Abbreviation': 'mx', 'Year': 2008, 'Life expectancy': 11.383}, 
{'Country': 'mexico', 'Abbreviation': 'mx', 'Year': 1999, 'Life expectancy': 26.837}]

There are built-in min and max functions and they can be applied to data (it contains repetition and I would keep it as the challange for OP :-)):

longest = max(data, key=lambda x: x["Life expectancy"])
shortest = min(data, key=lambda x: x["Life expectancy"])

print(f"Happiest life in {longest['Country']} in {longest['Year']} for {longest['Life expectancy']}")
print(f"Saddest life in {shortest['Country']} in {shortest['Year']} for {shortest['Life expectancy']}")

# outputs
Happiest life in mexico in 1999 for 26.837
Saddest life in brasil in 2002 for 7.982

For filtering data based on target value we can define helper function to yield rows which match the criteria (once again, I left some things to solve):

def filter_rows(data, **kwargs):
    for row in data:
         for key, value in kwargs.items():
             if row.get(key) != value:
                 break
         else:
             yield row

print(sum(row["Life expectancy"] for row in filter_rows(data, Year=2021)))
# 47.326
records = filter_rows(data, Country="brasil")
print(*records)
#{'Country': 'brasil', 'Abbreviation': 'bra', 'Year': 1999, 'Life expectancy': 22.473} 
# {'Country': 'brasil', 'Abbreviation': 'bra', 'Year': 2002, 'Life expectancy': 7.982}

Pedroski55 · Oct-21-2024, 03:55 PM

I would just use DictReader from the module csv. DictReader gives you a dictionary for every row in your csv. The dictionary keys are the column headers.

csv.reader and DictReader are generators, so it is very small in memory. People come here and say, "I have a csv with 10 million rows, how to deal with that?"

import csv

path2csv = '/home/pedro/temp/life_expect.csv'

# get a list of dictionaries with the data from the csv
with open(path2csv) as infile:
    data = list(csv.DictReader(infile))
    
cc = input('What country do you want to know the average life span for? ')
year = input('Which year are you thinking of? ')
# didn't use year here
for d in data:
    if d['country'] == cc:
        print(d['life'])

Output:20.663
17.638

da = csv.DictReader(path2csv)
             
type(da)
             
<class 'csv.DictReader'>
import sys
sys.getsizeof(da)

Output:
48

You can do this without csv by simulating the DictReader: get the first row as keys, then make a dictionary of every row, using the headers as keys.

**deanhystad** · Oct-21-2024, 06:20 PM

This is vague:

Quote:i can not use panda or other libarys

Does this mean you can only use built-ins? No imports (csv for example)?

STUdevil · Oct-21-2024, 07:05 PM

(Oct-21-2024, 06:20 PM)deanhystad Wrote: This is vague:

Quote:i can not use panda or other libarys
Does this mean you can only use built-ins? No imports (csv for example)?

i can only inmport the data but no other program or so that helps me to work with this

**deanhystad** · Oct-21-2024, 08:39 PM

If you cannot use any libraries, even a standard library like csv, you'll need to do something like menator's example. Just ignore where menator imports csv and then doesn't use it. Using names from your example code:

with open("life_expectancy.csv", "r") as file:
    data = []
    for line in file:
        country, code, year, expectancy = line.split(",")
        data.append((country.strip(), code.strip(), int(year), float(expectancy)))

print(data)

Output:
[('germany', 'ger', 2021, 20.663), ('germany', 'ger', 1990, 17.638), ('brasil', 'bra', 1999, 22.473), ('brasil', 'bra', 2002, 7.982), ('england', 'uk', 2021, 9.827), ('england', 'uk', 1999, 14.672), ('japan', 'ja', 2005, 20.661), ('japan', 'ja', 2021, 16.836), ('mexico', 'mx', 2008, 11.383), ('mexico', 'mx', 1999, 26.837)]

Now you have a list where each element in the list is a line from your expectancy file. You can use this list to extract information. For example, I can get all the data for Japan.

japan_data = [entry for entry in data if entry[0] == 'japan']
print(japan_data)

Output:
[('japan', 'ja', 2005, 20.661), ('japan', 'ja', 2021, 16.836)]

You can also use functions like max, min and sort on these values. A better way to find max and min expectancy.

expectancy = lambda x: x[3]
print("Max life expectancy =", max(data, key=expectancy))
print("Min life expectancy =", min(data, key=expectancy))

Output:Max life expectancy = ('mexico', 'mx', 1999, 26.837)
Min life expectancy = ('brasil', 'bra', 2002, 7.982)

Or print a table of the data in chronological order.

country = lambda x: x[0]
year = lambda x: x[2]
for entry in sorted(data, key=year):
    print(f"{year(entry)} {country(entry):10} {expectancy(entry):6.3f}")

Output:1990 germany    17.638
1999 brasil     22.473
1999 england    14.672
1999 mexico     26.837
2002 brasil      7.982
2005 japan      20.661
2008 mexico     11.383
2021 germany    20.663
2021 england     9.827
2021 japan      16.836

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	in this code, I input Key_word, it can not find although all data was exact Help me!	duchien04x4	3	2,123	Aug-31-2023, 05:36 PM Last Post: deanhystad
	Trying to get year not the entire year & time	mbrown009	2	1,849	Jan-09-2023, 01:46 PM Last Post: snippsat
	get data from excel and find max/min	Timmy94	1	1,915	Jul-27-2022, 08:23 AM Last Post: Larz60+
	what will be the best way to find data in txt file?	korenron	2	2,008	Jul-25-2022, 10:03 AM Last Post: korenron
	Python Pandas: How do I average ONLY the data >1000 from several columns?	JaneTan	0	2,056	Jul-17-2021, 01:34 PM Last Post: JaneTan
	Find string between two substrings, in a stream of data	xbit	1	2,771	May-09-2021, 03:32 PM Last Post: bowlofred
	get year information from a timestamp data frame	asli	1	2,264	Jan-08-2021, 09:11 PM Last Post: Larz60+
	Find data using a period of time in SQLITE3	SmukasPlays	2	2,981	Jul-30-2020, 02:02 PM Last Post: SmukasPlays
	Find Average of User Input Defined number of Scores	DustinKlent	1	7,962	Oct-25-2019, 12:40 AM Last Post: Larz60+
	Does anyone know how to scrape/find this data?	randomguy	8	5,072	Jan-20-2019, 06:28 PM Last Post: randomguy

find the average data from everyone in the same year

User Panel Messages

Announcements