Aug-05-2017, 06:43 PM
Hello, fairly new to python, and especially new to numpy.
Mission: I was tasked with writing a function that will take the input of an arbitrary number of files each of which has an array that's filled with random numbers. The arrays within the files are all the same size. For each cell in the array, take that same cell from each file and find the mean for it and then return the results as an array that's the same dimensions as those inside the files.
My solution: I'm really new to Numpy, and so many of the attempts I've made in the last few days have been dead ends or too complicated to manage. I've tried putting all the rows in an array and calculating mean on each, but then I didn't know how to get them back into shape. Anyway, please look at the the code below, which is my solution, and give me any feedback for a more readable elegant solution. What I really want is the ability to use NumPy's mean function, since it's more optimized than mine.
Mission: I was tasked with writing a function that will take the input of an arbitrary number of files each of which has an array that's filled with random numbers. The arrays within the files are all the same size. For each cell in the array, take that same cell from each file and find the mean for it and then return the results as an array that's the same dimensions as those inside the files.
My solution: I'm really new to Numpy, and so many of the attempts I've made in the last few days have been dead ends or too complicated to manage. I've tried putting all the rows in an array and calculating mean on each, but then I didn't know how to get them back into shape. Anyway, please look at the the code below, which is my solution, and give me any feedback for a more readable elegant solution. What I really want is the ability to use NumPy's mean function, since it's more optimized than mine.
import numpy as np def mean_datasets(input_files): # Read-In Files to get array shape and file count file_count = 0 for file in input_files: data = np.genfromtxt(file, delimiter=',') columns = data.shape[1] rows = data.shape[0] file_count += 1 # Initilize the calculation array with file information calc_array = np.zeros([rows,columns]) # Go through each file and sum the same cell per file for file in input_files: data = np.genfromtxt(file, delimiter=',') row_num = 0 for row in data: column_num = 0 for cell in row: calc_array[row_num,column_num] += cell column_num += 1 row_num += 1 # Go through the calculation array and find the mean for each cell row_num = 0 for row in calc_array: column_num = 0 for cell in row: calc_array[row_num, column_num] = round(cell/file_count, 1) column_num += 1 row_num += 1 return calc_array test_datasets = mean_datasets(['data1.csv', 'data2.csv', 'data3.csv']) #'data4.csv', 'data5.csv', 'data6.csv']) print(test_datasets)Here's an example dataset:
-9.4610,-0.9349,8.5322,1.0458 0.6367,-3.5322,0.5127,-3.8569 3.9008,7.1903,-9.1945,-4.0130Thank you for any help, advice, feedback!