Python Forum

Full Version: Need help improving function that reads file into list of tuples
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi everyone,

I'm having some slight difficulty getting this function to do exactly what I need it to do.

Essentially it reads in a file.csv, and needs to store various columns into a "master_list" of tuples.

My function works, but I'm sure could be improved. It does not return exactly the output I need.

These are the columns of interest:
Quote:year = int(line[2])
month = int(line[3])
magnitude = float(line[9])
location = line[19]
latitude = float(line[20])
longitude = float(line[21])
deaths = int(line[23])
missing = int(line[25])
injuries = int(line[27])
damages = float(line[29])

and the requirement:
Quote:If the number of deaths, missing, injured, and damages columns are empty, replace
it with a zero. If any other numerical data cannot be made into an int or float,
skip that entire line of data. Create a tuple with items in this order:
tup = (year,month,magnitude,location,latitude,longitude,\
deaths,missing,injuries,damages)

Here's my code:

def read_file(fp):
    next(fp, None)
    masterList = []
    tup = ()

    for col in csv.reader(fp, delimiter=',', skipinitialspace=True):
        year = col[2]
        month = col[3]
        magnitude = col[9]
        location = col[19]
        latitude = col[20]
        longitude = col[21]
        deaths = col[23]
        missing = col[25]
        injured = col[27]
        damages = col[29]

        try:
            year = int(year)
            month = int(month)
            magnitude = float(magnitude)
            latitude = float(latitude)
            longitude = float(longitude)
        except:
            continue
            
                    
        if deaths.isdigit() == True:
            if int(deaths) > 0:
                deaths = int(deaths)
        elif deaths == '':
            deaths = int('0')
        else:
            deaths = int('0')
            
        if missing.isdigit() == True:
            if int(missing) > 0:
                missing = int(missing)
        elif missing == '':
            missing = int('0')
        else:
            missing = int('0')
            
            
        if injured.isdigit() == True:
            if int(injured) > 0:
                injured = int(injured)
        elif injured == '':
            injured = int('0')
        else:
            injured = int('0')
            
        if isinstance(damages, float) == True:
            try:
                damages = float(damages)
                if damages:
                    damages = int('0')
            except:
                damages = int('0')
I think the most major problem with my function right now is that the last column (damages) does not return the correct value all of the time. If the cell in the csv is blank, it needs to be a 0. If not, it needs to read the float. I can't seem to get this right, can anybody offer some suggestions?


Expected output:
[(2020, 1, 6.0, 'CHINA:  XINJIANG PROVINCE', 39.831, 77.106, 1, 0, 2, 0), (2020, 1, 6.7, 'TURKEY:  ELAZIG AND MALATYA PROVINCES', 38.39, 39.081, 41, 0, 1600, 0), (2020, 1, 7.7, 'CUBA: GRANMA;  CAYMAN IS;  JAMAICA', 19.44, -78.755, 0, 0, 0, 0), (2020, 2, 6.0, 'TURKEY: VAN;  IRAN', 38.482, 44.367, 10, 0, 60, 0), (2020, 3, 5.4, 'BALKANS NW:  CROATIA:  ZAGREB', 45.897, 15.966, 1, 0, 27, 6000.0), (2020, 3, 5.7, 'USA: UTAH', 40.751, -112.078, 0, 0, 0, 48.5)]
My output (notice the last float in the tuple is wrong):

[(2020, 1, 6.0, 'CHINA:  XINJIANG PROVINCE', 39.831, 77.106, 1, 0, 2, 0), (2020, 1, 6.7, 'TURKEY:  ELAZIG AND MALATYA PROVINCES', 38.39, 39.081, 41, 0, 1600, 0), (2020, 1, 7.7, 'CUBA: GRANMA;  CAYMAN IS;  JAMAICA', 19.44, -78.755, 0, 0, 0, 0), (2020, 2, 6.0, 'TURKEY: VAN;  IRAN', 38.482, 44.367, 10, 0, 60, 0), (2020, 3, 5.4, 'BALKANS NW:  CROATIA:  ZAGREB', 45.897, 15.966, 1, 0, 27, 6000.0), (2020, 3, 5.7, 'USA: UTAH', 40.751, -112.078, 0, 0, 0, 0)]
I don't understand the logic you're doing in the final section.

At line 53, you run the section only if damages is already a float. I would imagine if it's already a float you would want to leave it alone.

At line 56/57, if damages is set to some non-zero value, you change it to be a zero. Why?

In several places you say int('0'). Why not just say 0 instead?
(Nov-02-2020, 11:11 PM)bowlofred Wrote: [ -> ]I don't understand the logic you're doing in the final section.

At line 53, you run the section only if damages is already a float. I would imagine if it's already a float you would want to leave it alone.

At line 56/57, if damages is set to some non-zero value, you change it to be a zero. Why?

In several places you say int('0'). Why not just say 0 instead?

Well that's the section that's giving me trouble, so that's the section I've been toying around with. I'm really still learning and was just experimenting with different outcomes.

I'm trying to get it to check the cell... if the cell contains a positive float, that's the value.
if it's blank it needs to be 0.
And I don't understand why that last float (utah tuple), is coming out as "0", when the cell contains 48.5.
So, perhaps something like this?:

        if isinstance(damages, float) == True:
            damages = float(damages)
        else:
            damages = int('0')
But that still leaves my utah tuple with a final value of 0.
This is what ended up working for me:

        if damages:
            try:
                damages = float(damages)
            except:
                damages = int('0')
        else:
            damages = int('0')
Thanks for the suggestion!
As mentioned above, you really don't need to convert the string "0" to an integer on lines 5 and 7. Just use an integer 0!