Python Forum
Iterating over pandas.df to check for values out of range
Thread Rating:
  • 2 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Iterating over pandas.df to check for values out of range
#1
Hello!

I am taking a introductory course in python programming at uni., 3 weeks in., and have been assigned to make a student grading script.

I am having problems iterating over grade values found in a pandas data frame. The grade values consist of valid values (values between -3.0 and 12) and invalid values (everything else, including nan values). The idea is to find occurrences of invalid values and track these by modifying a 'memory' array (length len(orginal data)) consisting of ones with zeroes (see code).

I would love to upload files but I can't until I post 5 times.

The problematic code:
------------------------------------------------------------------------------------------------
for i in range(len(Grades)):


if(i < -3.0 or i > 12.0):
Print("The grade value is out of range. Grade was {} in Line {}.".format(i, Rowcount))
#modify -Valid- array per index, with zero if the if statement is satisfied.
Valid[Rowcount]=0
Rowcount+=1
------------------------------------------------------------------------------------------------

The Output:
------------------------------------------------------------------------------------------------

The grade value is out of range. Grade was 13 in Line 13.
The grade value is out of range. Grade was 14 in Line 14.
The grade value is out of range. Grade was 15 in Line 15.
The grade value is out of range. Grade was 16 in Line 16.
The grade value is out of range. Grade was 17 in Line 17.
The grade value is out of range. Grade was 18 in Line 18.
The grade value is out of range. Grade was 19 in Line 19.
------------------------------------------------------------------------------------------------

Thus it seems that the code is assigning row number to -i- and using this row value with regard to the -if- statement and not the actual value in the dataframe.

I also tried using:
------------------------------------------------------------------------------------------------
for float in range(len(Grades)):
------------------------------------------------------------------------------------------------

This does not work either. Im confused with regard to setting up for loops and apparently haven't understood it properly.

The entirety of the code can be found below...

Thankyou very much in advance.

Regards

Spyder - Python 3.6.1 64bits, Qt 5.6.2, PyQt5 5.6 on Darwin on Mac OS 10.12.6



import numpy as np
import chardet
import pandas as pd

# Open and read csv file 
with open('testfilex2.csv', 'rb') as f:
    
    #detect encoding of csv file, assigning encoding to -Result-
    Result = chardet.detect(f.read())
    
# Use panda to read csv file relative to above detected encoding    
Data = pd.read_csv('testfilex2.csv', encoding=Result['encoding'], header= None)

#Display duplicates (student ids) found pd.dataframe -Data-
print(Data[Data.duplicated([0], keep=False)])

# Drop duplicated rows based on ID[0] and Names[1]
Data = Data.drop_duplicates([0], keep='last')
Data = Data.drop_duplicates([1], keep='last')



# Compute number of rows and columns
Columns = len(Data.columns)
Rows    = len(Data.index)

# Create a selection of columns to group data
Ids = Data.iloc[:, 0]
Names = Data.iloc[:, 1]

# The amount of grades columns is unknown in the assignment, I must therefore
# create code that will work with x amount of columns
Grades = Data.iloc[:, range(2,Columns)]

# Create an array of ones for indexing, in order to remove rows from
# original df-Data by modifying this array with zeros based on below for/ if loop.
Valid = np.ones(len(Grades))   

# Create variable to keep track of rows, where invalid data might exist
Rowcount=0

# My problems starts here...
for i in range(len(Grades)):
    
    
    if(i < -3.0 or i > 12.0):
        Print("The grade value is out of range. Grade was {} in Line {}.".format(i, Rowcount))
        #modify -Valid- array per index, with zero if the if statement is satisfied.
        Valid[Rowcount]=0
    Rowcount+=1    

    
    
# Modify orginal df-Data by the Valid array, in order to attain valid data     
Data=Data[Valid==1,:]
Reply


Messages In This Thread
Iterating over pandas.df to check for values out of range - by Padowan - Nov-25-2017, 05:23 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  How can I run a function inside a loop every 24 values of the loop iteration range? mcva 1 2,201 Sep-18-2019, 04:50 PM
Last Post: buran
  Check if integer is between two values Wolfpack2605 3 34,904 Dec-24-2017, 06:39 AM
Last Post: DeaD_EyE
  "List index out of range" for output values pegn305 3 5,413 Nov-26-2017, 02:20 PM
Last Post: heiner55

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020