Nov-26-2017, 04:26 PM
well yes, but originally it was the for loop which was the problem...
Now I figured out how to iterate over the values in the data. But the code is not modifying the "memory" index array ( I called this array -Valid-) properly, with zeros for every occurence of an invalid value. The memory array( the array I called -Valid-) turns all the columns values from 1 to zero, instead of zero for the respective row and column index..
The following runs without python errors for me;
Now I figured out how to iterate over the values in the data. But the code is not modifying the "memory" index array ( I called this array -Valid-) properly, with zeros for every occurence of an invalid value. The memory array( the array I called -Valid-) turns all the columns values from 1 to zero, instead of zero for the respective row and column index..
The following runs without python errors for me;
import numpy as np import chardet import pandas as pd # Open and read csv file with open('testfilex1w.csv', 'rb') as f: #detect encoding of csv file, assigning encoding to -Result- Result = chardet.detect(f.read()) # Use panda to read csv file relative to above detected encoding Data = pd.read_csv('testfilex1w.csv', encoding=Result['encoding'], header= None) #Display duplicates (student ids) found in pd.dataframe -Data- print(Data[Data.duplicated([0], keep=False)]) # Drop duplicated rows based on IDs[0] and Names[1], IDS and Names are defined # below Data = Data.drop_duplicates([0], keep='last') Data = Data.drop_duplicates([1], keep='last') # Compute number of rows and columns of original data Columns = len(Data.columns) Rows = len(Data.index) # Create a selection of columns to group data Ids = Data.iloc[:, 0] Names = Data.iloc[:, 1] # The amount of grades columns is unknown in the assignment, I must therefore # create code that will work with x amount of columns Grades = Data.iloc[:, range(2,Columns)] # Compute number of rows and columns of dupliate ridden grades Gradecols = len(Grades.columns) Graderows =len(Grades.index) Elements = (Gradecols * Graderows) # Create a "memory" array of ones for indexing, in order to remove rows from # original grades data by modifying this array with zeros based on below for/ if loop. Valid = pd.DataFrame(np.ones((len(Grades.index),len(Grades.columns)))) # Create variable to keep track of rows, where invalid data might exist Rowcount=0 # Create matrix array of grades... because I cant figure out how to iterate over # a pandas dataframe :( Gradesnp = Grades.values # For loop with if statements in order to find occurences of values out of # range and modify the memory array with zeros so that I can remove out of range # values from orginal grades data. for row in Gradesnp: for item in row: if(item < -3.0 or item > 12.0): print("Invalid grade found! Grade was {} in Line {}.".format(item, Rowcount)) #modify -Valid- array per index, with zero if the if statement is satisfied. Valid[Rowcount]=0 Rowcount+=1 # Through the use of Boolean indexing, the data array is modified by the # memory vector(-Valid-) so that values corresponding to 0 are excluded and values #corresponding to 1 are included. #Validgrades=Gradesnp[Valid==1,:]