Posts: 75
Threads: 14
Joined: Jul 2022
I obtain daily csv files where the header values contain leading\trailing space values which I want to remove. In some cases, there can be 2 spaces and in other cases only 1 e.g. ' Gross Sales '. I want, in this example to be 'Gross Sales'. How can I do that using Python code?
Posts: 453
Threads: 16
Joined: Jun 2022
You can use the .strip() method.
1 2 3 |
text = ' Gross Sales '
output = text.strip()
print (output)
|
Output: Gross Sales
Sig:
>>> import this
The UNIX philosophy: "Do one thing, and do it well."
"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse
"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Posts: 6,796
Threads: 20
Joined: Feb 2020
Oct-20-2022, 03:39 PM
(This post was last modified: Oct-20-2022, 03:39 PM by deanhystad.)
You can use a dialect. I have a csv file that looks like this:
Output: one, two, three
1, 2, 3
1, 2, 3
When I read this using the default dialect I get spaces in my header.
1 2 3 4 |
import csv
with open ( 'test.txt' , "r" ) as file :
for row in csv.reader( file ):
print (row)
|
Output: ['one', ' two', ' three']
['1', ' 2', ' 3']
['1', ' 2', ' 3']
See the leading space before two an three?
None of the provided dialects fixes the problem, but I can use a dialog sniffer to create a dialect for me.
1 2 3 4 5 6 |
import csv
with open ( 'test.txt' , "r" ) as file :
dialect = csv.Sniffer().sniff( file .read())
file .seek( 0 )
for row in csv.reader( file , dialect = dialect):
print (row)
|
['one', 'two', 'three']
['1', '2', '3']
['1', '2', '3']
You can also use different formats/dialects for the header and the table.
1 2 3 4 5 6 7 |
import csv
with open ( 'test.txt' , "r" ) as file :
header = next (csv.reader( file , skipinitialspace = True ))
print (header)
for row in csv.reader( file , quoting = csv.QUOTE_NONNUMERIC):
print (row)
|
1 2 3 |
[ 'one' , 'two' , 'three' ]
[ 1.0 , 2.0 , 3.0 ]
[ 1.0 , 2.0 , 3.0 ]
|
Or you can use pandas. Pandas is smarter than the csv library. It can handle both strings and numbers without needing different format specifiers.
1 2 3 4 5 |
import pandas as pd
df = pd.read_csv( "test.txt" , skipinitialspace = True )
print (df)
print (df.columns)
|
Output: one two three
0 1 2 3
1 1 2 3
Index(['one', 'two', 'three'], dtype='object')
The problem with all these solutions is they don't handle trailing whitespace. As far as can tell nothing handles trailing whitespace. I modify my csv file to look like this
Output: one , two , three
1 , 2 , 3
1 , 2 , 3
1 2 3 4 5 6 |
import pandas as pd
df = pd.read_csv( "test.txt" , skipinitialspace = True )
df[ "sum" ] = df. sum (axis = 1 )
print (df)
print (df.columns)
|
Output: one two three sum
0 1 2 3 6
1 1 2 3 6
Index(['one ', 'two ', 'three ', 'sum'], dtype='object')
Pandas strips off or ignores the trailing spaces when reading numbers, but it leaves the trailing spaces for strings. The only way I've found to do that is replace the columns with stripped versions of themselves.
1 2 3 4 5 6 7 |
import pandas as pd
df = pd.read_csv( "test.txt" )
df[ "sum" ] = df. sum (axis = 1 )
df.columns = [col.strip() for col in df.columns]
print (df)
print (df.columns)
|
Output: one two three sum
0 1 2 3 6
1 1 2 3 6
Index(['one', 'two', 'three', 'sum'], dtype='object')
Posts: 75
Threads: 14
Joined: Jul 2022
Thanks for that. I tried the last snippet of code and the print statement code gives me the headers without any leading\trailing spaces.
How would I adapt the above code to remove the leading\trailing spaces in the csv file and overwrite the file with the leading\trailing spaces removed in the header?
Posts: 6,796
Threads: 20
Joined: Feb 2020
Oct-20-2022, 06:32 PM
(This post was last modified: Oct-20-2022, 06:32 PM by deanhystad.)
I wouldn't use CSV at all.
I have a csv like file named data.txt
Output: id , name , Age
1 , Peter Parker, 1
2 , Charles Xavier , 2
3 , George of the Jungle , 3
It has leading and trailing spaces because someone tried to make it pretty and to lined up the columns.
1 2 3 4 5 6 7 |
delim = ","
with open ( "data.txt" , "r" ) as src:
with open ( "data.csv" , "w" ) as dst:
for line in src:
values = map ( str .strip, line.split(delim))
dst.write( "," .join(values) + "\n" )
|
# Put it back together, append linefeed and write
This is the result after running the program.
Output: id,name,Age
1,Peter Parker,1
2,Charles Xavier,2
3,George of the Jungle,3
If you want to overwrite source file.
1 2 3 4 5 6 7 8 |
delim = ","
with open ( "data.csv" , "r" ) as file :
lines = [ map ( str .strip, line.split(delim)) for line in file ]
with open ( "data.csv" , "w" ) as file :
for line in lines:
file .write( "," .join(line) + "\n" )
|
Posts: 75
Threads: 14
Joined: Jul 2022
Oct-22-2022, 12:18 AM
(This post was last modified: Oct-22-2022, 12:21 AM by azizrasul.)
The last bit of code worked a treat on a csv file. The only problem was that it got rid of leading\trailing spaces in the data. I only want the leading\trailing spaces in the first line removed. I tried the following, with my limited knowledge of Python: -
1 2 3 4 5 6 7 8 9 10 11 |
delim = ","
x = 0
with open ( "filtered_csv.csv" , "r" ) as file :
lines = [ map ( str .strip, line.split(delim)) for line in file ]
with open ( "filtered_csv.csv" , "w" ) as file :
for line in lines:
if x = = 0 :
file .write( "," .join(line) + "\n" )
x = x + 1
|
But all it did was to remove all the data and left only the header. I guess I need to do something different on the str.strip line but don't how to only iterate the first line.
Posts: 2,953
Threads: 48
Joined: Sep 2016
There is that readline method that reads one line at a time so you can get the first line process it and write the others untouched
Posts: 6,796
Threads: 20
Joined: Feb 2020
Or you could use an iterator and next.
1 2 3 4 5 6 7 8 9 |
delim = ","
with open ( "data.csv" , "r" ) as file :
lines = iter ( list ( file ))
with open ( "data.txt" , "w" ) as file :
file .write( "," .join( map ( str .strip, next (lines).split(delim))) + "\n" )
file .writelines(lines)
|
Why do you want to leave spaces in the rest of the file?
Posts: 75
Threads: 14
Joined: Jul 2022
Thanks both. The code worked a treat. Trying to understand how the code reads only the first line?
I'm only interested in the header row as I need to create relationships with the same header names that don't contain the spaces.
Thanks for the code that got rid of spaces in all the rows. Should come in useful in the future.
|