Python Forum
Removing leading\trailing spaces - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Removing leading\trailing spaces (/thread-38492.html)



Removing leading\trailing spaces - azizrasul - Oct-20-2022

I obtain daily csv files where the header values contain leading\trailing space values which I want to remove. In some cases, there can be 2 spaces and in other cases only 1 e.g. ' Gross Sales '. I want, in this example to be 'Gross Sales'. How can I do that using Python code?


RE: Removing leading\trailing spaces - rob101 - Oct-20-2022

You can use the .strip() method.

text = ' Gross Sales '
output = text.strip()
print(output)
Output:
Gross Sales



RE: Removing leading\trailing spaces - deanhystad - Oct-20-2022

You can use a dialect. I have a csv file that looks like this:
Output:
one, two, three 1, 2, 3 1, 2, 3
When I read this using the default dialect I get spaces in my header.
import csv
with open('test.txt', "r") as file:
    for row in csv.reader(file):
        print(row)
Output:
['one', ' two', ' three'] ['1', ' 2', ' 3'] ['1', ' 2', ' 3']
See the leading space before two an three?

None of the provided dialects fixes the problem, but I can use a dialog sniffer to create a dialect for me.
import csv
with open('test.txt', "r") as file:
    dialect = csv.Sniffer().sniff(file.read())
    file.seek(0)
    for row in csv.reader(file, dialect=dialect):
        print(row)
['one', 'two', 'three']
['1', '2', '3']
['1', '2', '3']
You can also use different formats/dialects for the header and the table.
import csv
 
with open('test.txt', "r") as file:
    header = next(csv.reader(file, skipinitialspace=True))
    print(header)
    for row in csv.reader(file, quoting=csv.QUOTE_NONNUMERIC):
        print(row)
['one', 'two', 'three']
[1.0, 2.0, 3.0]
[1.0, 2.0, 3.0]
Or you can use pandas. Pandas is smarter than the csv library. It can handle both strings and numbers without needing different format specifiers.
import pandas as pd

df = pd.read_csv("test.txt", skipinitialspace=True)
print(df)
print(df.columns)
Output:
one two three 0 1 2 3 1 1 2 3 Index(['one', 'two', 'three'], dtype='object')
The problem with all these solutions is they don't handle trailing whitespace. As far as can tell nothing handles trailing whitespace. I modify my csv file to look like this
Output:
one , two , three 1 , 2 , 3 1 , 2 , 3
import pandas as pd

df = pd.read_csv("test.txt", skipinitialspace=True)
df["sum"] = df.sum(axis=1)
print(df)
print(df.columns)
Output:
one two three sum 0 1 2 3 6 1 1 2 3 6 Index(['one ', 'two ', 'three ', 'sum'], dtype='object')
Pandas strips off or ignores the trailing spaces when reading numbers, but it leaves the trailing spaces for strings. The only way I've found to do that is replace the columns with stripped versions of themselves.
import pandas as pd

df = pd.read_csv("test.txt")
df["sum"] = df.sum(axis=1)
df.columns = [col.strip() for col in df.columns]
print(df)
print(df.columns)
Output:
one two three sum 0 1 2 3 6 1 1 2 3 6 Index(['one', 'two', 'three', 'sum'], dtype='object')



RE: Removing leading\trailing spaces - azizrasul - Oct-20-2022

Thanks for that. I tried the last snippet of code and the print statement code gives me the headers without any leading\trailing spaces.

How would I adapt the above code to remove the leading\trailing spaces in the csv file and overwrite the file with the leading\trailing spaces removed in the header?


RE: Removing leading\trailing spaces - deanhystad - Oct-20-2022

I wouldn't use CSV at all.

I have a csv like file named data.txt
Output:
id , name , Age 1 , Peter Parker, 1 2 , Charles Xavier , 2 3 , George of the Jungle , 3
It has leading and trailing spaces because someone tried to make it pretty and to lined up the columns.
delim = ","

with open("data.txt", "r") as src:
    with open("data.csv", "w") as dst:
        for line in src:
            values = map(str.strip, line.split(delim))  # Get columns with left and right whitespace removed
            dst.write(",".join(values)+"\n")
# Put it back together, append linefeed and write
This is the result after running the program.
Output:
id,name,Age 1,Peter Parker,1 2,Charles Xavier,2 3,George of the Jungle,3
If you want to overwrite source file.
delim = ","

with open("data.csv", "r") as file:
    lines = [map(str.strip, line.split(delim)) for line in file]

with open("data.csv", "w") as file:
    for line in lines:
        file.write(",".join(line)+"\n")



RE: Removing leading\trailing spaces - azizrasul - Oct-22-2022

The last bit of code worked a treat on a csv file. The only problem was that it got rid of leading\trailing spaces in the data. I only want the leading\trailing spaces in the first line removed. I tried the following, with my limited knowledge of Python: -

delim = ","
x=0

with open("filtered_csv.csv", "r") as file:
    lines = [map(str.strip, line.split(delim)) for line in file]

with open("filtered_csv.csv", "w") as file:
        for line in lines:
            if x==0:
                file.write(",".join(line) + "\n")
                x=x+1
But all it did was to remove all the data and left only the header. I guess I need to do something different on the str.strip line but don't how to only iterate the first line.


RE: Removing leading\trailing spaces - wavic - Oct-22-2022

There is that readline method that reads one line at a time so you can get the first line process it and write the others untouched


RE: Removing leading\trailing spaces - deanhystad - Oct-22-2022

Or you could use an iterator and next.
# Read in all the lines
delim = ","
with open("data.csv", "r") as file:
    lines = iter(list(file))

with open("data.txt", "w") as file:
    # Strip spaces in header row
    file.write(",".join(map(str.strip, next(lines).split(delim)))+"\n")
    file.writelines(lines)  # Copy other lines unchanged
Why do you want to leave spaces in the rest of the file?


RE: Removing leading\trailing spaces - azizrasul - Oct-23-2022

Thanks both. The code worked a treat. Trying to understand how the code reads only the first line?

I'm only interested in the header row as I need to create relationships with the same header names that don't contain the spaces.

Thanks for the code that got rid of spaces in all the rows. Should come in useful in the future.