Bottom Page

Thread Rating:
  • 2 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Iterating over pandas.df to check for values out of range
#1
Hello!

I am taking a introductory course in python programming at uni., 3 weeks in., and have been assigned to make a student grading script.

I am having problems iterating over grade values found in a pandas data frame. The grade values consist of valid values (values between -3.0 and 12) and invalid values (everything else, including nan values). The idea is to find occurrences of invalid values and track these by modifying a 'memory' array (length len(orginal data)) consisting of ones with zeroes (see code).

I would love to upload files but I can't until I post 5 times.

The problematic code:
------------------------------------------------------------------------------------------------
for i in range(len(Grades)):


if(i < -3.0 or i > 12.0):
Print("The grade value is out of range. Grade was {} in Line {}.".format(i, Rowcount))
#modify -Valid- array per index, with zero if the if statement is satisfied.
Valid[Rowcount]=0
Rowcount+=1
------------------------------------------------------------------------------------------------

The Output:
------------------------------------------------------------------------------------------------

The grade value is out of range. Grade was 13 in Line 13.
The grade value is out of range. Grade was 14 in Line 14.
The grade value is out of range. Grade was 15 in Line 15.
The grade value is out of range. Grade was 16 in Line 16.
The grade value is out of range. Grade was 17 in Line 17.
The grade value is out of range. Grade was 18 in Line 18.
The grade value is out of range. Grade was 19 in Line 19.
------------------------------------------------------------------------------------------------

Thus it seems that the code is assigning row number to -i- and using this row value with regard to the -if- statement and not the actual value in the dataframe.

I also tried using:
------------------------------------------------------------------------------------------------
for float in range(len(Grades)):
------------------------------------------------------------------------------------------------

This does not work either. Im confused with regard to setting up for loops and apparently haven't understood it properly.

The entirety of the code can be found below...

Thankyou very much in advance.

Regards

Spyder - Python 3.6.1 64bits, Qt 5.6.2, PyQt5 5.6 on Darwin on Mac OS 10.12.6



import numpy as np
import chardet
import pandas as pd

# Open and read csv file 
with open('testfilex2.csv', 'rb') as f:
    
    #detect encoding of csv file, assigning encoding to -Result-
    Result = chardet.detect(f.read())
    
# Use panda to read csv file relative to above detected encoding    
Data = pd.read_csv('testfilex2.csv', encoding=Result['encoding'], header= None)

#Display duplicates (student ids) found pd.dataframe -Data-
print(Data[Data.duplicated([0], keep=False)])

# Drop duplicated rows based on ID[0] and Names[1]
Data = Data.drop_duplicates([0], keep='last')
Data = Data.drop_duplicates([1], keep='last')



# Compute number of rows and columns
Columns = len(Data.columns)
Rows    = len(Data.index)

# Create a selection of columns to group data
Ids = Data.iloc[:, 0]
Names = Data.iloc[:, 1]

# The amount of grades columns is unknown in the assignment, I must therefore
# create code that will work with x amount of columns
Grades = Data.iloc[:, range(2,Columns)]

# Create an array of ones for indexing, in order to remove rows from
# original df-Data by modifying this array with zeros based on below for/ if loop.
Valid = np.ones(len(Grades))   

# Create variable to keep track of rows, where invalid data might exist
Rowcount=0

# My problems starts here...
for i in range(len(Grades)):
    
    
    if(i < -3.0 or i > 12.0):
        Print("The grade value is out of range. Grade was {} in Line {}.".format(i, Rowcount))
        #modify -Valid- array per index, with zero if the if statement is satisfied.
        Valid[Rowcount]=0
    Rowcount+=1    

    
    
# Modify orginal df-Data by the Valid array, in order to attain valid data     
Data=Data[Valid==1,:]
Quote
#2
>>>I would love to upload files but I can't until I post 5 times.

You could paste some lines.
Quote
#3
Sure...

from the csv file..

s123456 Luetta Macon 2 10.2 3 1 0
s123457 Zola Metoyer 2.1 -2 0 0
s123458 Leora Ewan 3.7 -3 3 0 2
s123459 Deanne Fetter 6.8 -1 0 0 0
s123460 Onie Kyler 12 -2 2 5 0
s123461 Lynne Tomer 11.4 -1 2 0 3
s123462 Emmie Center 10.1 3 0 0 0
s123463 Wilfredo Perrigo 9 2 14 -7 0
s123464 Noreen Kriegel 5 4 1 0 3
s123465 Layne Cousin 7 0 15 0
Quote
#4
Sorry, your program does not run on my PC.
Too much errors.  So I can't help.
Quote
#5
Ok, Im assuming that you loaded a similiar csv file? The whole code is dependent on the csv file..

If not, try inserting a # in front of the last line of code. This last line will not work because it is dependent on the aforementioned for loop.

The code works fine on my computer..

But thanks for your time nonetheless!
Quote
#6
I copy the lines below to testfilex2.csv
and I got a lot of error in line 18/19 of your python code.
Maybe I have a different python, numpy, or...

Output:
s123456 Luetta Macon 2 10.2 3 1 0 s123457 Zola Metoyer 2.1 -2 0 0 s123458 Leora Ewan 3.7 -3 3 0 2 s123459 Deanne Fetter 6.8 -1 0 0 0 s123460 Onie Kyler 12 -2 2 5 0 s123461 Lynne Tomer 11.4 -1 2 0 3 s123462 Emmie Center 10.1 3 0 0 0 s123463 Wilfredo Perrigo 9 2 14 -7 0 s123464 Noreen Kriegel 5 4 1 0 3 s123465 Layne Cousin 7 0 15 0
Quote
#7
Ok, maybe its the copy-pasting business..

Im running this, don't know if that helps.

Spyder - Python 3.6.1 64bits, Qt 5.6.2, PyQt5 5.6 on Darwin on Mac OS 10.12.6

I am at 5 posts now, I'll try to upload the csv file..
Quote
#8
OK, glad to hear
Quote
#9
I can now upload attachments, here is the needed csv file..


Attached Files
.csv   testfilex2.csv (Size: 725 bytes / Downloads: 80)
Quote
#10
Thanks. This file is different than that you pasted in post #7.
Tomorrow I will try it again.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  sending arduino data to python and use a parity check 10_Lucas 6 248 Aug-03-2019, 10:37 AM
Last Post: DeaD_EyE
  list index out of range mcgrim 2 241 May-25-2019, 07:44 PM
Last Post: mcgrim
  Use range to add char. to empty string johneven 4 947 Apr-15-2019, 11:21 PM
Last Post: scidam
  Please check code Evgeniy2019 4 272 Apr-05-2019, 06:37 PM
Last Post: Evgeniy2019
  How to check an array exist in a file using Python fitrisibarani 17 749 Feb-27-2019, 04:16 PM
Last Post: ichabod801
  switch case with range jun 3 363 Feb-27-2019, 05:35 AM
Last Post: scidam
  Need to use range with decimals KameronG 7 394 Feb-08-2019, 07:57 PM
Last Post: ichabod801
  IndexError: list index out of range abdullahali 4 454 Jan-17-2019, 07:54 AM
Last Post: buran
  iterating through a number of files gonzo620 6 539 Nov-26-2018, 10:02 PM
Last Post: nilamo
  Hi how to take row 1 for every column and check the value what class is it kirito85 2 477 Nov-21-2018, 06:52 AM
Last Post: kirito85

Forum Jump:


Users browsing this thread: 1 Guest(s)