Python Forum
find the average data from everyone in the same year
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
find the average data from everyone in the same year
#1
have some more problems with this code but i thing if i can overcome this one i can find a way to fix the others as well. have a file with country, year, and life_expectancy and i need to calculate what is the average life_expectancy from all the country put together from a choosen year. so the user types in a year and the programm print him the average number from that year. here is the parth where my code struggles the most.
if choose_year.lower() == year:
     year_expectancy = sum(expectancy) / len(expectancy)
i can not use panda or other libarys and should be only fixed with relly basic coding. thank you all for your help

#Ah example of how the list work
# germany,  ger,  2021,   20.663,
# germany,  ger,  1990,   17.638,
# brasil,   bra,  1999,   22.473,
# brasil,   bra,  2002,   7.982,
# England,  UK,   2021,   9.827,
# england   UK,   1999,   14.672,
# japan,    ja,   2005,   20.661,
# japan,    ja,   2021,   16.836,
# mexico,   mx,   2008,   11.383,
# mexico,   mx,   1999,   26.837,



life_expectancy = open('D://life-expectancy.csv')

with open('D://life-expectancy.csv') as data:
    max_expectancy = 0
    min_expectancy = 99999999
    index = 0
    choose_year = 0
    year_expectancy = 0
    for line in data:
        data = line.strip()
        data = line.split(',')
        entity = data[0]
        code = data[1]
        year = int(data[2])
        expectancy = float(data[3])
         

        if expectancy > max_expectancy:
            max_expectancy = expectancy
            max_country = entity
            max_year = year

        if expectancy < min_expectancy:
            min_expectancy =expectancy
            min_country = entity 
            min_year = year  
            


choose_year = input('Enter the year of interest : ')


# prints allways 0. that calculation does not work
if choose_year.lower() == year:
    year_expectancy = sum(expectancy) / len(expectancy)

print()     
print(f'The overall max life expectancy is: {max_expectancy} from {max_country} in {max_year}')
print(f'The overall min life expectancy is: {min_expectancy} from {min_country} in {min_year}')


       
print()
print(f'For the year {choose_year}:')
# here i need the answer of the calculation put it
print(f'The average life expectancy across all countries was {year_expectancy}')
#     #print(f'The max life expenctancy was in {} with {}')
#     #print(f'The min life expectancy was in {} with{}')
Reply
#2
Maybe something like this. There are better ways.

import csv

file = 'test.csv'

# Create a list
tmplist = []
alist = []
tmp = []

# Get the data from csv file
with open(file, 'r') as data:
    for lines in data.readlines():
        tmplist.append(lines.strip().split(','))

# Do some cleanup of data
for lines in tmplist:
    line = [word.strip() for word in lines]
    alist.append(line)

year = input('Choose Year: ')

for index, item in enumerate(alist):
    if item[2] == year:
        tmp.append(float(item[3]))

if len(tmp) > 0:
    avg = sum(tmp)/len(tmp)

    print(f'Avg. {avg}')
else:
    print(f'There is no data for {year}')
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags
Download my project scripts


Reply
#3
What have you tried? You should start by collecting your data in a list or dictionary so you can use it after it’s been read.
Reply
#4
I did Sort them into countrys, abbreviation, years and data
And i can Work with These 4 parts, but now i struggles when. I need the whole Line from all with the Same year
Reply
#5
In my example query the list for the whole line using an if statement.
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags
Download my project scripts


Reply
#6
This task can be broken into steps:

- read data from file
- find max and min values of column and get data from that row
- filter column data and perform calculations.

# data in life-expectancy.csv as this:
germany,ger,2021,20.663
germany,ger,1990,17.638
brasil,bra,1999,22.473
brasil,bra,2002,7.982
England,UK,2021,9.827
england,UK,1999,14.672
japan,ja,2005,20.661
japan,ja,2021,16.836
mexico,mx,2008,11.383
mexico,mx,1999,26.837

# read data from file, convert values and create list of dictionaries:

import csv

with open("life-expectancy.csv", "r", newline="") as csvfile:
    processing = {"Country": str, "Abbreviation": str, "Year": int, "Life expectancy": float}
    reader = csv.DictReader(csvfile, fieldnames=processing)
    data = [{k:processing[k](v) for k, v in line.items()} for line in reader]

# data is:
[{'Country': 'germany', 'Abbreviation': 'ger', 'Year': 2021, 'Life expectancy': 20.663}, 
{'Country': 'germany', 'Abbreviation': 'ger', 'Year': 1990, 'Life expectancy': 17.638}, 
{'Country': 'brasil', 'Abbreviation': 'bra', 'Year': 1999, 'Life expectancy': 22.473}, 
{'Country': 'brasil', 'Abbreviation': 'bra', 'Year': 2002, 'Life expectancy': 7.982}, 
{'Country': 'England', 'Abbreviation': 'UK', 'Year': 2021, 'Life expectancy': 9.827}, 
{'Country': 'england', 'Abbreviation': 'UK', 'Year': 1999, 'Life expectancy': 14.672}, 
{'Country': 'japan', 'Abbreviation': 'ja', 'Year': 2005, 'Life expectancy': 20.661}, 
{'Country': 'japan', 'Abbreviation': 'ja', 'Year': 2021, 'Life expectancy': 16.836}, 
{'Country': 'mexico', 'Abbreviation': 'mx', 'Year': 2008, 'Life expectancy': 11.383}, 
{'Country': 'mexico', 'Abbreviation': 'mx', 'Year': 1999, 'Life expectancy': 26.837}]
There are built-in min and max functions and they can be applied to data (it contains repetition and I would keep it as the challange for OP :-)):

longest = max(data, key=lambda x: x["Life expectancy"])
shortest = min(data, key=lambda x: x["Life expectancy"])

print(f"Happiest life in {longest['Country']} in {longest['Year']} for {longest['Life expectancy']}")
print(f"Saddest life in {shortest['Country']} in {shortest['Year']} for {shortest['Life expectancy']}")

# outputs
Happiest life in mexico in 1999 for 26.837
Saddest life in brasil in 2002 for 7.982
For filtering data based on target value we can define helper function to yield rows which match the criteria (once again, I left some things to solve):

def filter_rows(data, **kwargs):
    for row in data:
         for key, value in kwargs.items():
             if row.get(key) != value:
                 break
         else:
             yield row

print(sum(row["Life expectancy"] for row in filter_rows(data, Year=2021)))
# 47.326
records = filter_rows(data, Country="brasil")
print(*records)
#{'Country': 'brasil', 'Abbreviation': 'bra', 'Year': 1999, 'Life expectancy': 22.473} 
# {'Country': 'brasil', 'Abbreviation': 'bra', 'Year': 2002, 'Life expectancy': 7.982}
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#7
I would just use DictReader from the module csv. DictReader gives you a dictionary for every row in your csv. The dictionary keys are the column headers.

csv.reader and DictReader are generators, so it is very small in memory. People come here and say, "I have a csv with 10 million rows, how to deal with that?"

import csv

path2csv = '/home/pedro/temp/life_expect.csv'

# get a list of dictionaries with the data from the csv
with open(path2csv) as infile:
    data = list(csv.DictReader(infile))
    
cc = input('What country do you want to know the average life span for? ')
year = input('Which year are you thinking of? ')
# didn't use year here
for d in data:
    if d['country'] == cc:
        print(d['life'])
Output:
20.663 17.638
da = csv.DictReader(path2csv)
             
type(da)
             
<class 'csv.DictReader'>
import sys
sys.getsizeof(da)
Output:
48
You can do this without csv by simulating the DictReader: get the first row as keys, then make a dictionary of every row, using the headers as keys.
Reply
#8
This is vague:
Quote:i can not use panda or other libarys
Does this mean you can only use built-ins? No imports (csv for example)?
Reply
#9
(Oct-21-2024, 06:20 PM)deanhystad Wrote: This is vague:
Quote:i can not use panda or other libarys
Does this mean you can only use built-ins? No imports (csv for example)?

i can only inmport the data but no other program or so that helps me to work with this
Reply
#10
If you cannot use any libraries, even a standard library like csv, you'll need to do something like menator's example. Just ignore where menator imports csv and then doesn't use it. Using names from your example code:
with open("life_expectancy.csv", "r") as file:
    data = []
    for line in file:
        country, code, year, expectancy = line.split(",")
        data.append((country.strip(), code.strip(), int(year), float(expectancy)))

print(data)
Output:
[('germany', 'ger', 2021, 20.663), ('germany', 'ger', 1990, 17.638), ('brasil', 'bra', 1999, 22.473), ('brasil', 'bra', 2002, 7.982), ('england', 'uk', 2021, 9.827), ('england', 'uk', 1999, 14.672), ('japan', 'ja', 2005, 20.661), ('japan', 'ja', 2021, 16.836), ('mexico', 'mx', 2008, 11.383), ('mexico', 'mx', 1999, 26.837)]
Now you have a list where each element in the list is a line from your expectancy file. You can use this list to extract information. For example, I can get all the data for Japan.
japan_data = [entry for entry in data if entry[0] == 'japan']
print(japan_data)
Output:
[('japan', 'ja', 2005, 20.661), ('japan', 'ja', 2021, 16.836)]
You can also use functions like max, min and sort on these values. A better way to find max and min expectancy.
expectancy = lambda x: x[3]
print("Max life expectancy =", max(data, key=expectancy))
print("Min life expectancy =", min(data, key=expectancy))
Output:
Max life expectancy = ('mexico', 'mx', 1999, 26.837) Min life expectancy = ('brasil', 'bra', 2002, 7.982)
Or print a table of the data in chronological order.
country = lambda x: x[0]
year = lambda x: x[2]
for entry in sorted(data, key=year):
    print(f"{year(entry)} {country(entry):10} {expectancy(entry):6.3f}")
Output:
1990 germany 17.638 1999 brasil 22.473 1999 england 14.672 1999 mexico 26.837 2002 brasil 7.982 2005 japan 20.661 2008 mexico 11.383 2021 germany 20.663 2021 england 9.827 2021 japan 16.836
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question in this code, I input Key_word, it can not find although all data was exact Help me! duchien04x4 3 2,123 Aug-31-2023, 05:36 PM
Last Post: deanhystad
  Trying to get year not the entire year & time mbrown009 2 1,849 Jan-09-2023, 01:46 PM
Last Post: snippsat
  get data from excel and find max/min Timmy94 1 1,915 Jul-27-2022, 08:23 AM
Last Post: Larz60+
  what will be the best way to find data in txt file? korenron 2 2,008 Jul-25-2022, 10:03 AM
Last Post: korenron
  Python Pandas: How do I average ONLY the data >1000 from several columns? JaneTan 0 2,056 Jul-17-2021, 01:34 PM
Last Post: JaneTan
  Find string between two substrings, in a stream of data xbit 1 2,771 May-09-2021, 03:32 PM
Last Post: bowlofred
  get year information from a timestamp data frame asli 1 2,264 Jan-08-2021, 09:11 PM
Last Post: Larz60+
  Find data using a period of time in SQLITE3 SmukasPlays 2 2,981 Jul-30-2020, 02:02 PM
Last Post: SmukasPlays
  Find Average of User Input Defined number of Scores DustinKlent 1 7,962 Oct-25-2019, 12:40 AM
Last Post: Larz60+
  Does anyone know how to scrape/find this data? randomguy 8 5,072 Jan-20-2019, 06:28 PM
Last Post: randomguy

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020