Python Forum
deleting columns in CSV file - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: deleting columns in CSV file (/thread-38797.html)



deleting columns in CSV file - astral_travel - Nov-25-2022

is there a way, with pandas or otherwise - to specify a range of columns to be deleted without writing the name/header of the columns ? as in pop() and drop()


RE: deleting columns in CSV file - snippsat - Nov-25-2022

import pandas as pd

data = {
    1: [1, 19, 20, 21, 25, 29, 30, 31, 30, 29, 31],
    2: [2, 10, 20, 20, 20, 10, 10, 20, 20, 10, 10],
    3: [3, 10, 20, 20, 20, 10, 10, 20, 20, 10, 10],
    4: [4, 10, 20, 20, 20, 10, 10, 20, 20, 10, 10],
}

df = pd.DataFrame(data)
Delete columns from 2 to 4,adjust as needed.
>>> df
     1   2   3   4
0    1   2   3   4
1   19  10  10  10
2   20  20  20  20
3   21  20  20  20
4   25  20  20  20
5   29  10  10  10
6   30  10  10  10
7   31  20  20  20
8   30  20  20  20
9   29  10  10  10
10  31  10  10  10

>>> df.drop(df.columns[2:4], axis=1, inplace=True)
>>> df
     1   2
0    1   2
1   19  10
2   20  20
3   21  20
4   25  20
5   29  10
6   30  10
7   31  20
8   30  20
9   29  10
10  31  10



RE: deleting columns in CSV file - astral_travel - Nov-26-2022

alright ! thank you !
-------------------------

okay, here's a problem - each time i run the drop() method and delete the columns i want - it adds a column at the beginning/start of numbers (i think it shows the same thing in your example) - on the left...

how is it possible to avoid this ?


RE: deleting columns in CSV file - deanhystad - Nov-26-2022

A column is not added when you drop. Can you provide an example?


RE: deleting columns in CSV file - astral_travel - Nov-26-2022

i guess it's okay, but still,
here's the code:

import pandas as pd

data = pd.read_csv('/home/tal/investing/allstocks.csv')


print("Original 'allstocks.csv' CSV Data: \n")
print(data)

df = pd.DataFrame(data)

df.drop(df.columns[3:], axis=1, inplace=True)
df.to_csv('/home/tal/investing/allstocks.csv')


print("\nCSV Data after deleting the column 'year':\n")
print(df)
and attached is a screenshot of the result,
as you can see there's an added column at the beginning (the most left) of digits (signifying the row number),
isn't the row number already taken into account ? (or is it just in Excel where it gives the rows their numbers ?)
so if a column is added to signify the row's numbers (like an index) i have no problem, but suppose i delete an additional time a column that was not deleted in the first deletion, what happens is it adds another column (the rows index twice),


RE: deleting columns in CSV file - snippsat - Nov-26-2022

Row index that always first in a DataFrame.
Usally when take data out of the DataFrame eg .csv don't want this index then use index=False
df.to_csv('/home/tal/investing/allstocks.csv', index=False)



RE: deleting columns in CSV file - astral_travel - Nov-26-2022

alright !!
hehehe....that's awesome !
-----------------------------------

there's an error that came up in the IDE:
Error:
Name 'columnSeriesObj' can be undefined
here's the code:

import pandas as pd

data = pd.read_csv('/home/tal/investing/allstocks.csv')


print("Original 'allstocks.csv' CSV Data: \n")
print(data)

df = pd.DataFrame(data)

for column in df['Ticker']:
    columnSeriesObj = df['Ticker']
    print('Column Contents : ', columnSeriesObj.values)

columnSeriesObj.to_csv('/home/tal/investing/Ticker.csv', index=False)
although i got the wanted result, it produced the file i want with only one of the columns (the Ticker column), but why in the IDE it gives me this error ?


RE: deleting columns in CSV file - snippsat - Nov-26-2022

(Nov-26-2022, 05:23 PM)astral_travel Wrote: although i got the wanted result, it produced the file i want with only one of the columns (the Ticker column), but why in the IDE it gives me this error ?
The loop make litte sense,this dos the same
df['Ticker'].to_csv('/home/tal/investing/Ticker.csv', index=False)
The varibale columnSeriesObj is created 3 times in the,where only the last one is used.
The columnSeriesObj.values just print the same values 3 times.

In Pandas so is using loops in many/most cases the wrong soution as should use a vectorized solution(that's build-in and there are many)
So Pandas is a diffrent way to program than the standar Python way,you are new to both so it's can be confusing.

Can look 30 Methods You Should Master To Become A Pandas Pro
See that not once is a for loop used.


RE: deleting columns in CSV file - astral_travel - Nov-26-2022

thank you very much for the correction,
and for the link, that link is very useful !