Python Forum

Full Version: Reshaping a single column in to multiple column using Python
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have an excel file containing a single column (Row's number is not fixed). Using Python 3, I want to,
  1. Import my excel file/data in python,
  2. Read/select the data column (first column), and
  3. Reshape this column into multiple columns having 10 rows in each column and finally
  4. Writing output to a new excel file.
I am a Matlab user and know how to do this in Matlab using Matlab’s reshape command
Quote:newVar = reshape(myColumn, 10, [])
.

Looking for someone to help me out achieving this in Python 3.
We will help you, but you need to show effort and let us know where you get stuck. Of course, there is not just one way, so my suggestions are how I would approach the problem. As a start, use Pandas to import into a dataframe. Use the iloc function to select your range in the desired column, and copy that data to the new column.

See Pandas documentation
Ok, I have tried but could not get the desired result.
import pandas as pd
import numpy as np
df =  pd.read_excel('sample.xlsx')
first_column = pd.DataFrame(df.iloc[:,0])
arr = np.array(first_column)
newArr = arr.reshape(arr, (10, -1))
Code shows the error:
Quote:newArr = arr.reshape(10, -1)
Quote:TypeError: only integer scalar arrays can be converted to a scalar index
reshape is a function, not a method of nparray. The call is "new_array = numpy.reshape(arr, (10, -1))".
(Jun-19-2022, 07:20 PM)deanhystad Wrote: [ -> ]reshape is a function, not a method of nparray. The call is "new_array = numpy.reshape(arr, (10, -1))".

Hi, thanks for your explanation. I have tried this but no results.

import pandas as pd
import numpy as np
df =  pd.read_excel('sample.xlsx')
myCol = pd.DataFrame(df.iloc[:,0])
arr = np.array (myCol)
newArr = np.reshape(arr, (10, -1), order='F')
np.savetxt("newMatrix.csv", newArr, delimiter=",")
I have found a strange thing. If i add one more row in my column, the same code performs as intended.
Image
Please note now I have 31 rows. To me it seems that something has to do with indexing.
The default in pandas read_excel() function is that the first row is used as a header. If you start with data then add the parameter header=None
Always helps to read the documentation, especially when trying something new.

https://pandas.pydata.org/pandas-docs/st...excel.html

Quote:headerint, list of int, default 0
Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a MultiIndex. Use None if there is no header.

https://numpy.org/doc/stable/reference/g...shape.html

Quote:numpy.reshape(a, newshape, order='C')[source]