Removing leading\trailing spaces

azizrasul · Oct-20-2022, 02:41 AM

I obtain daily csv files where the header values contain leading\trailing space values which I want to remove. In some cases, there can be 2 spaces and in other cases only 1 e.g. ' Gross Sales '. I want, in this example to be 'Gross Sales'. How can I do that using Python code?

rob101 · Oct-20-2022, 04:01 AM

You can use the .strip() method.

        
              text = ' Gross Sales '
output = text.strip()
print(output)

Output:
Gross Sales

**deanhystad** · (This post was last modified: Oct-20-2022, 03:39 PM by deanhystad.)

You can use a dialect. I have a csv file that looks like this:

Output:one, two, three
1, 2, 3
1, 2, 3

When I read this using the default dialect I get spaces in my header.

        
              import csv
with open('test.txt', "r") as file:
    for row in csv.reader(file):
        print(row)

Output:['one', ' two', ' three']
['1', ' 2', ' 3']
['1', ' 2', ' 3']

See the leading space before two an three?

None of the provided dialects fixes the problem, but I can use a dialog sniffer to create a dialect for me.

        
              import csv
with open('test.txt', "r") as file:
    dialect = csv.Sniffer().sniff(file.read())
    file.seek(0)
    for row in csv.reader(file, dialect=dialect):
        print(row)

['one', 'two', 'three']
['1', '2', '3']
['1', '2', '3']
You can also use different formats/dialects for the header and the table.

        
              import csv
  
with open('test.txt', "r") as file:
    header = next(csv.reader(file, skipinitialspace=True))
    print(header)
    for row in csv.reader(file, quoting=csv.QUOTE_NONNUMERIC):
        print(row)

        
              ['one', 'two', 'three']
[1.0, 2.0, 3.0]
[1.0, 2.0, 3.0]

Or you can use pandas. Pandas is smarter than the csv library. It can handle both strings and numbers without needing different format specifiers.

        
              import pandas as pd
 
df = pd.read_csv("test.txt", skipinitialspace=True)
print(df)
print(df.columns)

Output:   one  two  three
0    1    2      3
1    1    2      3
Index(['one', 'two', 'three'], dtype='object')

The problem with all these solutions is they don't handle trailing whitespace. As far as can tell nothing handles trailing whitespace. I modify my csv file to look like this

Output: one , two , three 
1 , 2 , 3 
1 , 2 , 3

        
              import pandas as pd
 
df = pd.read_csv("test.txt", skipinitialspace=True)
df["sum"] = df.sum(axis=1)
print(df)
print(df.columns)

Output:   one   two   three   sum
0     1     2       3    6
1     1     2       3    6
Index(['one ', 'two ', 'three ', 'sum'], dtype='object')

Pandas strips off or ignores the trailing spaces when reading numbers, but it leaves the trailing spaces for strings. The only way I've found to do that is replace the columns with stripped versions of themselves.

        
              import pandas as pd
 
df = pd.read_csv("test.txt")
df["sum"] = df.sum(axis=1)
df.columns = [col.strip() for col in df.columns]
print(df)
print(df.columns)

Output:   one  two  three  sum
0    1    2      3    6
1    1    2      3    6
Index(['one', 'two', 'three', 'sum'], dtype='object')

azizrasul · Oct-20-2022, 06:01 PM

Thanks for that. I tried the last snippet of code and the print statement code gives me the headers without any leading\trailing spaces.

How would I adapt the above code to remove the leading\trailing spaces in the csv file and overwrite the file with the leading\trailing spaces removed in the header?

**deanhystad** · (This post was last modified: Oct-20-2022, 06:32 PM by deanhystad.)

I wouldn't use CSV at all.

I have a csv like file named data.txt

Output:id , name ,                 Age
1 ,  Peter Parker,          1
2 ,  Charles Xavier ,       2
3 ,  George of the Jungle , 3

It has leading and trailing spaces because someone tried to make it pretty and to lined up the columns.

        
              delim = ","
 
with open("data.txt", "r") as src:
    with open("data.csv", "w") as dst:
        for line in src:
            values = map(str.strip, line.split(delim))  # Get columns with left and right whitespace removed
            dst.write(",".join(values)+"\n")

# Put it back together, append linefeed and write
This is the result after running the program.

Output:id,name,Age
1,Peter Parker,1
2,Charles Xavier,2
3,George of the Jungle,3

If you want to overwrite source file.

        
              delim = ","
 
with open("data.csv", "r") as file:
    lines = [map(str.strip, line.split(delim)) for line in file]
 
with open("data.csv", "w") as file:
    for line in lines:
        file.write(",".join(line)+"\n")

azizrasul · (This post was last modified: Oct-22-2022, 12:21 AM by azizrasul.)

The last bit of code worked a treat on a csv file. The only problem was that it got rid of leading\trailing spaces in the data. I only want the leading\trailing spaces in the first line removed. I tried the following, with my limited knowledge of Python: -

        
              delim = ","
x=0
 
with open("filtered_csv.csv", "r") as file:
    lines = [map(str.strip, line.split(delim)) for line in file]
 
with open("filtered_csv.csv", "w") as file:
        for line in lines:
            if x==0:
                file.write(",".join(line) + "\n")
                x=x+1

But all it did was to remove all the data and left only the header. I guess I need to do something different on the str.strip line but don't how to only iterate the first line.

wavic · Oct-22-2022, 05:27 AM

There is that readline method that reads one line at a time so you can get the first line process it and write the others untouched

**deanhystad** · Oct-22-2022, 06:21 AM

Or you could use an iterator and next.

        
              # Read in all the lines
delim = ","
with open("data.csv", "r") as file:
    lines = iter(list(file))
 
with open("data.txt", "w") as file:
    # Strip spaces in header row
    file.write(",".join(map(str.strip, next(lines).split(delim)))+"\n")
    file.writelines(lines)  # Copy other lines unchanged

Why do you want to leave spaces in the rest of the file?

azizrasul · Oct-23-2022, 11:06 PM

Thanks both. The code worked a treat. Trying to understand how the code reads only the first line?

I'm only interested in the header row as I need to create relationships with the same header names that don't contain the spaces.

Thanks for the code that got rid of spaces in all the rows. Should come in useful in the future.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Removing leading whitespaces	palladium	1	1,357	Mar-24-2023, 04:15 PM Last Post: bowlofred
	How to retrieve records in a DataFrame (Python/Pandas) that contains leading or trail	mmunozjr	3	3,588	Sep-05-2022, 11:56 AM Last Post: Pedroski55
	-i option changes sys.path (removes leading empty string '')	markanth	6	3,369	Aug-26-2022, 09:27 PM Last Post: markanth
	removing spaces	msaiahnl	2	1,778	Jul-25-2022, 03:34 PM Last Post: deanhystad
	How to keep leading zeros with pandas?	eeps24	1	8,583	May-20-2020, 07:51 PM Last Post: deanhystad
	Cancelling 'open directory' leading to crash	Fairbz_	1	2,732	May-08-2020, 03:14 PM Last Post: DPaul
	[Python3] Trailing newline in readlines.	LWFlouisa	4	6,535	Mar-10-2020, 09:57 AM Last Post: perfringo
	removing spaces/tabs after used .strip()	zarize	0	2,022	Sep-11-2019, 12:46 PM Last Post: zarize
	leading zero number formatting	RedSkeleton007	3	4,844	Jan-27-2019, 04:56 PM Last Post: RedSkeleton007
	Probs with leading zeros falling away :-)	Badosc	2	3,509	Dec-04-2018, 08:57 PM Last Post: Badosc

Removing leading\trailing spaces

User Panel Messages

Announcements