Python Forum
Iterating over pandas.df to check for values out of range
Thread Rating:
  • 2 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Iterating over pandas.df to check for values out of range
#1
Hello!

I am taking a introductory course in python programming at uni., 3 weeks in., and have been assigned to make a student grading script.

I am having problems iterating over grade values found in a pandas data frame. The grade values consist of valid values (values between -3.0 and 12) and invalid values (everything else, including nan values). The idea is to find occurrences of invalid values and track these by modifying a 'memory' array (length len(orginal data)) consisting of ones with zeroes (see code).

I would love to upload files but I can't until I post 5 times.

The problematic code:
------------------------------------------------------------------------------------------------
for i in range(len(Grades)):


if(i < -3.0 or i > 12.0):
Print("The grade value is out of range. Grade was {} in Line {}.".format(i, Rowcount))
#modify -Valid- array per index, with zero if the if statement is satisfied.
Valid[Rowcount]=0
Rowcount+=1
------------------------------------------------------------------------------------------------

The Output:
------------------------------------------------------------------------------------------------

The grade value is out of range. Grade was 13 in Line 13.
The grade value is out of range. Grade was 14 in Line 14.
The grade value is out of range. Grade was 15 in Line 15.
The grade value is out of range. Grade was 16 in Line 16.
The grade value is out of range. Grade was 17 in Line 17.
The grade value is out of range. Grade was 18 in Line 18.
The grade value is out of range. Grade was 19 in Line 19.
------------------------------------------------------------------------------------------------

Thus it seems that the code is assigning row number to -i- and using this row value with regard to the -if- statement and not the actual value in the dataframe.

I also tried using:
------------------------------------------------------------------------------------------------
for float in range(len(Grades)):
------------------------------------------------------------------------------------------------

This does not work either. Im confused with regard to setting up for loops and apparently haven't understood it properly.

The entirety of the code can be found below...

Thankyou very much in advance.

Regards

Spyder - Python 3.6.1 64bits, Qt 5.6.2, PyQt5 5.6 on Darwin on Mac OS 10.12.6



import numpy as np
import chardet
import pandas as pd

# Open and read csv file 
with open('testfilex2.csv', 'rb') as f:
    
    #detect encoding of csv file, assigning encoding to -Result-
    Result = chardet.detect(f.read())
    
# Use panda to read csv file relative to above detected encoding    
Data = pd.read_csv('testfilex2.csv', encoding=Result['encoding'], header= None)

#Display duplicates (student ids) found pd.dataframe -Data-
print(Data[Data.duplicated([0], keep=False)])

# Drop duplicated rows based on ID[0] and Names[1]
Data = Data.drop_duplicates([0], keep='last')
Data = Data.drop_duplicates([1], keep='last')



# Compute number of rows and columns
Columns = len(Data.columns)
Rows    = len(Data.index)

# Create a selection of columns to group data
Ids = Data.iloc[:, 0]
Names = Data.iloc[:, 1]

# The amount of grades columns is unknown in the assignment, I must therefore
# create code that will work with x amount of columns
Grades = Data.iloc[:, range(2,Columns)]

# Create an array of ones for indexing, in order to remove rows from
# original df-Data by modifying this array with zeros based on below for/ if loop.
Valid = np.ones(len(Grades))   

# Create variable to keep track of rows, where invalid data might exist
Rowcount=0

# My problems starts here...
for i in range(len(Grades)):
    
    
    if(i < -3.0 or i > 12.0):
        Print("The grade value is out of range. Grade was {} in Line {}.".format(i, Rowcount))
        #modify -Valid- array per index, with zero if the if statement is satisfied.
        Valid[Rowcount]=0
    Rowcount+=1    

    
    
# Modify orginal df-Data by the Valid array, in order to attain valid data     
Data=Data[Valid==1,:]
Reply
#2
>>>I would love to upload files but I can't until I post 5 times.

You could paste some lines.
Reply
#3
Sure...

from the csv file..

s123456 Luetta Macon 2 10.2 3 1 0
s123457 Zola Metoyer 2.1 -2 0 0
s123458 Leora Ewan 3.7 -3 3 0 2
s123459 Deanne Fetter 6.8 -1 0 0 0
s123460 Onie Kyler 12 -2 2 5 0
s123461 Lynne Tomer 11.4 -1 2 0 3
s123462 Emmie Center 10.1 3 0 0 0
s123463 Wilfredo Perrigo 9 2 14 -7 0
s123464 Noreen Kriegel 5 4 1 0 3
s123465 Layne Cousin 7 0 15 0
Reply
#4
Sorry, your program does not run on my PC.
Too much errors.  So I can't help.
Reply
#5
Ok, Im assuming that you loaded a similiar csv file? The whole code is dependent on the csv file..

If not, try inserting a # in front of the last line of code. This last line will not work because it is dependent on the aforementioned for loop.

The code works fine on my computer..

But thanks for your time nonetheless!
Reply
#6
I copy the lines below to testfilex2.csv
and I got a lot of error in line 18/19 of your python code.
Maybe I have a different python, numpy, or...

Output:
s123456 Luetta Macon 2 10.2 3 1 0 s123457 Zola Metoyer 2.1 -2 0 0 s123458 Leora Ewan 3.7 -3 3 0 2 s123459 Deanne Fetter 6.8 -1 0 0 0 s123460 Onie Kyler 12 -2 2 5 0 s123461 Lynne Tomer 11.4 -1 2 0 3 s123462 Emmie Center 10.1 3 0 0 0 s123463 Wilfredo Perrigo 9 2 14 -7 0 s123464 Noreen Kriegel 5 4 1 0 3 s123465 Layne Cousin 7 0 15 0
Reply
#7
Ok, maybe its the copy-pasting business..

Im running this, don't know if that helps.

Spyder - Python 3.6.1 64bits, Qt 5.6.2, PyQt5 5.6 on Darwin on Mac OS 10.12.6

I am at 5 posts now, I'll try to upload the csv file..
Reply
#8
OK, glad to hear
Reply
#9
I can now upload attachments, here is the needed csv file..

Attached Files

.csv   testfilex2.csv (Size: 725 bytes / Downloads: 281)
Reply
#10
Thanks. This file is different than that you pasted in post #7.
Tomorrow I will try it again.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How can I run a function inside a loop every 24 values of the loop iteration range? mcva 1 2,101 Sep-18-2019, 04:50 PM
Last Post: buran
  Check if integer is between two values Wolfpack2605 3 34,672 Dec-24-2017, 06:39 AM
Last Post: DeaD_EyE
  "List index out of range" for output values pegn305 3 5,251 Nov-26-2017, 02:20 PM
Last Post: heiner55

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020