Python Forum
Removing leading\trailing spaces
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Removing leading\trailing spaces
#1
I obtain daily csv files where the header values contain leading\trailing space values which I want to remove. In some cases, there can be 2 spaces and in other cases only 1 e.g. ' Gross Sales '. I want, in this example to be 'Gross Sales'. How can I do that using Python code?
Reply
#2
You can use the .strip() method.

text = ' Gross Sales '
output = text.strip()
print(output)
Output:
Gross Sales
Sig:
>>> import this

The UNIX philosophy: "Do one thing, and do it well."

"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse

"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Reply
#3
You can use a dialect. I have a csv file that looks like this:
Output:
one, two, three 1, 2, 3 1, 2, 3
When I read this using the default dialect I get spaces in my header.
import csv
with open('test.txt', "r") as file:
    for row in csv.reader(file):
        print(row)
Output:
['one', ' two', ' three'] ['1', ' 2', ' 3'] ['1', ' 2', ' 3']
See the leading space before two an three?

None of the provided dialects fixes the problem, but I can use a dialog sniffer to create a dialect for me.
import csv
with open('test.txt', "r") as file:
    dialect = csv.Sniffer().sniff(file.read())
    file.seek(0)
    for row in csv.reader(file, dialect=dialect):
        print(row)
['one', 'two', 'three']
['1', '2', '3']
['1', '2', '3']
You can also use different formats/dialects for the header and the table.
import csv
 
with open('test.txt', "r") as file:
    header = next(csv.reader(file, skipinitialspace=True))
    print(header)
    for row in csv.reader(file, quoting=csv.QUOTE_NONNUMERIC):
        print(row)
['one', 'two', 'three']
[1.0, 2.0, 3.0]
[1.0, 2.0, 3.0]
Or you can use pandas. Pandas is smarter than the csv library. It can handle both strings and numbers without needing different format specifiers.
import pandas as pd

df = pd.read_csv("test.txt", skipinitialspace=True)
print(df)
print(df.columns)
Output:
one two three 0 1 2 3 1 1 2 3 Index(['one', 'two', 'three'], dtype='object')
The problem with all these solutions is they don't handle trailing whitespace. As far as can tell nothing handles trailing whitespace. I modify my csv file to look like this
Output:
one , two , three 1 , 2 , 3 1 , 2 , 3
import pandas as pd

df = pd.read_csv("test.txt", skipinitialspace=True)
df["sum"] = df.sum(axis=1)
print(df)
print(df.columns)
Output:
one two three sum 0 1 2 3 6 1 1 2 3 6 Index(['one ', 'two ', 'three ', 'sum'], dtype='object')
Pandas strips off or ignores the trailing spaces when reading numbers, but it leaves the trailing spaces for strings. The only way I've found to do that is replace the columns with stripped versions of themselves.
import pandas as pd

df = pd.read_csv("test.txt")
df["sum"] = df.sum(axis=1)
df.columns = [col.strip() for col in df.columns]
print(df)
print(df.columns)
Output:
one two three sum 0 1 2 3 6 1 1 2 3 6 Index(['one', 'two', 'three', 'sum'], dtype='object')
Larz60+ likes this post
Reply
#4
Thanks for that. I tried the last snippet of code and the print statement code gives me the headers without any leading\trailing spaces.

How would I adapt the above code to remove the leading\trailing spaces in the csv file and overwrite the file with the leading\trailing spaces removed in the header?
Reply
#5
I wouldn't use CSV at all.

I have a csv like file named data.txt
Output:
id , name , Age 1 , Peter Parker, 1 2 , Charles Xavier , 2 3 , George of the Jungle , 3
It has leading and trailing spaces because someone tried to make it pretty and to lined up the columns.
delim = ","

with open("data.txt", "r") as src:
    with open("data.csv", "w") as dst:
        for line in src:
            values = map(str.strip, line.split(delim))  # Get columns with left and right whitespace removed
            dst.write(",".join(values)+"\n")
# Put it back together, append linefeed and write
This is the result after running the program.
Output:
id,name,Age 1,Peter Parker,1 2,Charles Xavier,2 3,George of the Jungle,3
If you want to overwrite source file.
delim = ","

with open("data.csv", "r") as file:
    lines = [map(str.strip, line.split(delim)) for line in file]

with open("data.csv", "w") as file:
    for line in lines:
        file.write(",".join(line)+"\n")
Reply
#6
The last bit of code worked a treat on a csv file. The only problem was that it got rid of leading\trailing spaces in the data. I only want the leading\trailing spaces in the first line removed. I tried the following, with my limited knowledge of Python: -

delim = ","
x=0

with open("filtered_csv.csv", "r") as file:
    lines = [map(str.strip, line.split(delim)) for line in file]

with open("filtered_csv.csv", "w") as file:
        for line in lines:
            if x==0:
                file.write(",".join(line) + "\n")
                x=x+1
But all it did was to remove all the data and left only the header. I guess I need to do something different on the str.strip line but don't how to only iterate the first line.
Reply
#7
There is that readline method that reads one line at a time so you can get the first line process it and write the others untouched
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#8
Or you could use an iterator and next.
# Read in all the lines
delim = ","
with open("data.csv", "r") as file:
    lines = iter(list(file))

with open("data.txt", "w") as file:
    # Strip spaces in header row
    file.write(",".join(map(str.strip, next(lines).split(delim)))+"\n")
    file.writelines(lines)  # Copy other lines unchanged
Why do you want to leave spaces in the rest of the file?
Reply
#9
Thanks both. The code worked a treat. Trying to understand how the code reads only the first line?

I'm only interested in the header row as I need to create relationships with the same header names that don't contain the spaces.

Thanks for the code that got rid of spaces in all the rows. Should come in useful in the future.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Removing leading whitespaces palladium 1 732 Mar-24-2023, 04:15 PM
Last Post: bowlofred
  How to retrieve records in a DataFrame (Python/Pandas) that contains leading or trail mmunozjr 3 1,765 Sep-05-2022, 11:56 AM
Last Post: Pedroski55
  -i option changes sys.path (removes leading empty string '') markanth 6 1,994 Aug-26-2022, 09:27 PM
Last Post: markanth
  removing spaces msaiahnl 2 1,101 Jul-25-2022, 03:34 PM
Last Post: deanhystad
  How to keep leading zeros with pandas? eeps24 1 6,581 May-20-2020, 07:51 PM
Last Post: deanhystad
  Cancelling 'open directory' leading to crash Fairbz_ 1 2,192 May-08-2020, 03:14 PM
Last Post: DPaul
  [Python3] Trailing newline in readlines. LWFlouisa 4 4,867 Mar-10-2020, 09:57 AM
Last Post: perfringo
  removing spaces/tabs after used .strip() zarize 0 1,603 Sep-11-2019, 12:46 PM
Last Post: zarize
  leading zero number formatting RedSkeleton007 3 3,948 Jan-27-2019, 04:56 PM
Last Post: RedSkeleton007
  Probs with leading zeros falling away :-) Badosc 2 2,887 Dec-04-2018, 08:57 PM
Last Post: Badosc

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020