You can use a dialect. I have a csv file that looks like this:
Output:
one, two, three
1, 2, 3
1, 2, 3
When I read this using the default dialect I get spaces in my header.
import csv
with open('test.txt', "r") as file:
for row in csv.reader(file):
print(row)
Output:
['one', ' two', ' three']
['1', ' 2', ' 3']
['1', ' 2', ' 3']
See the leading space before two an three?
None of the provided dialects fixes the problem, but I can use a dialog sniffer to create a dialect for me.
import csv
with open('test.txt', "r") as file:
dialect = csv.Sniffer().sniff(file.read())
file.seek(0)
for row in csv.reader(file, dialect=dialect):
print(row)
['one', 'two', 'three']
['1', '2', '3']
['1', '2', '3']
You can also use different formats/dialects for the header and the table.
import csv
with open('test.txt', "r") as file:
header = next(csv.reader(file, skipinitialspace=True))
print(header)
for row in csv.reader(file, quoting=csv.QUOTE_NONNUMERIC):
print(row)
['one', 'two', 'three']
[1.0, 2.0, 3.0]
[1.0, 2.0, 3.0]
Or you can use pandas. Pandas is smarter than the csv library. It can handle both strings and numbers without needing different format specifiers.
import pandas as pd
df = pd.read_csv("test.txt", skipinitialspace=True)
print(df)
print(df.columns)
Output:
one two three
0 1 2 3
1 1 2 3
Index(['one', 'two', 'three'], dtype='object')
The problem with all these solutions is they don't handle trailing whitespace. As far as can tell nothing handles trailing whitespace. I modify my csv file to look like this
Output:
one , two , three
1 , 2 , 3
1 , 2 , 3
import pandas as pd
df = pd.read_csv("test.txt", skipinitialspace=True)
df["sum"] = df.sum(axis=1)
print(df)
print(df.columns)
Output:
one two three sum
0 1 2 3 6
1 1 2 3 6
Index(['one ', 'two ', 'three ', 'sum'], dtype='object')
Pandas strips off or ignores the trailing spaces when reading numbers, but it leaves the trailing spaces for strings. The only way I've found to do that is replace the columns with stripped versions of themselves.
import pandas as pd
df = pd.read_csv("test.txt")
df["sum"] = df.sum(axis=1)
df.columns = [col.strip() for col in df.columns]
print(df)
print(df.columns)
Output:
one two three sum
0 1 2 3 6
1 1 2 3 6
Index(['one', 'two', 'three', 'sum'], dtype='object')