Posts: 36
Threads: 19
Joined: Mar 2022
I Have a file comprising 3 columns of data and I would like to transform these columns into a single column, one following the other e.g. below. Could someone kindly tell me how to write the code to do this. It would be much appreciated and get me out of a bind.
3560181.703 -3384947.321 -5.02245
3560191.703 -3384947.321 -4.88665
3560201.703 -3384947.321 -4.76042
Reformat to:
3560181.703
3560191.703
3560201.703
-3384947.321
-3384947.321
-3384947.321
-5.02245
-4.88665
-4.76042
Posts: 1,094
Threads: 143
Joined: Jul 2017
What've you got? Excel? Pandas df, csv?
You could just read the each row, column by column, into a list, then write that list to a single column.
Or dispense with the list and read each value straight into column 1 in a new file.
Posts: 36
Threads: 19
Joined: Mar 2022
It is a csv far to large for excel to handle, 1 million+ rows.
I just don't have python skills to do this and I can think of no other way.
Posts: 1,094
Threads: 143
Joined: Jul 2017
Big data needs Pandas, I believe. I never had anything more than 200 rows.
I believe Excel is limited to 4000+ columns, but rows are unlimited.
Anyway, this will give you 3 csv files, 1 for each column. You can then just concatenate them if you wiish
import pandas as pd
csv_file = '/home/pedro/myPython/pandas/testme.csv'
savepath = '/home/pedro/myPython/pandas/'
df = pd.read_csv(csv_file)
# get each column
columns = ['col1', 'col2', 'col3'] # put your column headers
for colnr in columns:
mydf = df[colnr]
mydf.to_csv(savepath + colnr + '.csv', index=False)
Posts: 300
Threads: 72
Joined: Apr 2019
Have a look to " flatten" or " ravel" using Numpy, usinf a single line and going very fast even with million of rows
Posts: 36
Threads: 19
Joined: Mar 2022
Many thanks Pedroski55
I'm getting the following error on this code. Would you have any idea of the problem.
Error: C:\Python3\python.exe C:/Users/David/PycharmProjects/pythonProject2/venv/Subspend_Z.py
Traceback (most recent call last):
File "C:\Python3\lib\site-packages\pandas\core\indexes\base.py", line 2889, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 97, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'col1'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:/Users/David/PycharmProjects/pythonProject2/venv/Subspend_Z.py", line 11, in <module>
mydf = df[colnr]
File "C:\Python3\lib\site-packages\pandas\core\frame.py", line 2899, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Python3\lib\site-packages\pandas\core\indexes\base.py", line 2891, in get_loc
raise KeyError(key) from err
KeyError: 'col1'
Process finished with exit code 1
Posts: 1,094
Threads: 143
Joined: Jul 2017
Apr-04-2022, 08:00 AM
(This post was last modified: Apr-04-2022, 08:00 AM by Pedroski55.)
I don't know what your column header names are.
Either call the columns col1, col2, col3 etc., or put your column header names in here, for any columns you want to single out.
# get each column
columns = ['col1', 'col2', 'col3'] # put your column headers That is to say, look in row 1 of your csv and see what the names of the columns you want to get are.
If there are no column headers, I believe Pandas will set something like col1, col2 ... but I don't really use Pandas for my little amounts of data.
Best set the names you want in row 1.
Posts: 36
Threads: 19
Joined: Mar 2022
Thank you so much Pedroski55, it worked a treat.
The only issue is that the transformation to individual CVS files left off the decimal places. Is there something that I can include in the script that will bring in the values say to 2 decimal places.
Kind regards
Dave
Posts: 1,094
Threads: 143
Joined: Jul 2017
I am really not familiar with Pandas, never having large amounts of data to process, but this seems to work
pd.options.display.float_format = '{:,.2f}'.format (change the 2 for more decimal places I believe)
import pandas as pd
csv_file = '/home/pedro/myPython/pandas/testme.csv'
savepath = '/home/pedro/myPython/pandas/'
pd.options.display.float_format = '{:,.2f}'.format
df = pd.read_csv(csv_file)
# get each column
columns = ['col1', 'col2', 'col3'] # put your column headers
for colnr in columns:
mydf = df[colnr]
mydf.to_csv(savepath + colnr + '.csv', index=False)
Posts: 1
Threads: 0
Joined: Mar 2025
Mar-19-2025, 03:46 AM
(This post was last modified: Mar-19-2025, 04:10 AM by buran.)
Your code effectively formats float values and saves each column as a separate CSV file. Just ensure the column names match your CSV file to avoid errors. Nice work! 👍😊
buran write Mar-19-2025, 04:10 AM:Spam link removed
|