Python Forum

Full Version: Retrieving a column from a data set using a function
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello all, this is my first time here so apologies if something is missed or in the wrong place.

I have recently started to learn to code and I am struggling with writing a function that will allow me to get all the data from a column in a data set by calling a function with inputs of the data set and a column number. I am completely stuck and any help would be appreciated.

My function starts like this:

def get_column(data, col_num): 
I already have the data so that does not need to be imported and col_num is the input to which I would type, for example:

print(get_column(data, 5))
and it would return the 4th column.

Thanks.
What is data? You call it a data set, but I am not familiar with that data type. Is it a Pandas DataFrame? Provide more information about your data please. If possible provide the code that creates data
(Oct-06-2021, 06:08 PM)deanhystad Wrote: [ -> ]What is data? You call it a data set, but I am not familiar with that data type. Is it a Pandas DataFrame? Provide more information about your data please. If possible provide the code that creates data

Sorry, the data is a .csv file. I am not currently using Pandas but I am using Numpy
Could you please provide code that shows how you are reading the csv file.

if data is a 2D numpy array, getting column 2 is as simple as: column=data[:, 2]:
import numpy as np

data = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# Print the columns one at a time.
for i in range(len(data[0])):
    print(i, data[:, i])
Output:
0 [1 5 9] 1 [ 2 6 10] 2 [ 3 7 11] 3 [ 4 8 12]
I am reading the file such that

data = np.loadtxt('data.csv', delimiter = ',', skiprows = 1)
I need my function to get that column of data when I call the function,

e.g when I do print(get_column(data, 3)) it should call the function and get me the data from column 4 only
If your data.csv file looks like mine, you can get columns the same way. No need to write a function.
import numpy as np

# data.csv looks like this: "these, are, column, headers\n1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12"
data = np.loadtxt('data.csv', delimiter = ',', skiprows = 1)
print(f'np Array\n{data}\n\nColumns')

# Print the columns one at a time.
for i in range(len(data[0])):
    print(i, data[:, i])  # You don't need a function.  This does all you need
Output:
np Array [[ 1. 2. 3. 4.] [ 5. 6. 7. 8.] [ 9. 10. 11. 12.]] Columns 0 [1. 5. 9.] 1 [ 2. 6. 10.] 2 [ 3. 7. 11.] 3 [ 4. 8. 12.]
But if you need to write a function.
def getColumn(data, column):
    return data[:, column]
Thank you so much for the help! It works perfectly now!