PANDAS: DataFrame | White Spaces & Special Character Removal

traibr · Sep-09-2020, 12:32 PM

Hi All,

Just to give you an idea, I'm new to python & coding in general, Tongue

, however I've been in IT for 14+ years...

I'm building an automated task to clean CSV data produced by one of our systems. Reason being, we use a automation application that sends out emails to customers & references that data to build the emails in question.

However the data produced by the system sometimes has white spaces or full stop's present in a number of the columns & rows produced Confused

(No idea as to why??). I manually go through this data every morning & clean it - records ranging from 20,000 > 2000... Sad

I'm getting a little sick & tired of doing this... & I know there's always a smarter way to work; & I guess learning to code Big Grin

Traceback (most recent call last):
  File "C:/Users/username/PycharmProjects/pythonProject/ManipulateCSVDataNRemovesErrors.py", line 19, in <module>
    df['Text6'] = df['Text6'].str.replace(" ", "")
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\generic.py", line 5126, in __getattr__
    return object.__getattribute__(self, name)
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\accessor.py", line 187, in __get__
    accessor_obj = self._accessor(obj)
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\strings.py", line 2100, in __init__
    self._inferred_dtype = self._validate(data)
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\strings.py", line 2157, in _validate
    raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!

The code I've built is below; please be gentle, I know this is probably not the best way to arrange code or even design it.. Blush

import pandas as pd


data = pd.read_csv(r'EMAIL9th Sep 2020_000000.csv')
df = pd.DataFrame(data, columns=['Letter Name', 'Our_Reference', 'Debtors Fullname', 'Contact Name', 'Salutation',
                                 'Mailing_Add1', 'Mailing_Add2', 'Mailing_Add3', 'Mailing_Add4', 'Mailing_Add5',
                                 'Mailing_Pcode', 'Client Name', 'Client Reference', 'Collector', 'Currency',
                                 'Balance', 'Initial Payment', 'No. Of Payments', 'Amount Agreed', 'Payable',
                                 'Starting From', 'Final Payment', 'LG Principal', 'LG Interest', 'LG Court Costs',
                                 'LG Solictor Cost', 'LG Warrant Fee', 'LG Total Due', 'Creditor Name',
                                 'Letter Header', 'Collector', 'Debtor Email', 'Inbound Tel', 'Text11', 'Text12',
                                 'Text1', 'Text2', 'Text3', 'Text4', 'Text5', 'Text6', 'Text7', 'Text8', 'Text9',
                                 'Text10', 'Date1', 'Date2', 'Date3', 'Date4', 'Date5', 'Date6', 'Number1',
                                 'Number2', 'Number3', 'Number4'])


df['Inbound Tel'] = df['Inbound Tel'].str.replace(" ", "")
df['Text11'] = df['Text11'].str.replace(".", "")
df['Text6'] = df['Text6'].str.replace(" ", "")

df.to_csv(r'EMAIL9th Sep 2020_000000_Mod.CSV', index=False)
print(df)

Now I understand this error "AttributeError: Can only use .str accessor with string values!" or I think I do, sting's are variables that hold characters & an integer's are variables that hold numerical values.

The three specified columns "Inbound Tel", "Text11" & "Text6" are set to general formatting when opened in excel (however, this is always by default)... So my question is, why does the code work on column "Inbound Tel" & not "Text6"?

Ever article or solution on Google when it comes to white spaces has pointed me to the following;

df['ColunmName'] = df['ColunmName'].str.replace(" ", "")

.

I've also tried

df['ColunmName'] = df['ColunmName'].__int__.replace(" ", "")

, this was unsuccessful... The following does not work either

df['ColunmName'] = df['ColunmName'].replace(" ", "")

.

However the following does work

df['ColunmName'] = df['ColunmName'].replace(".", "")

on 'Text11' Huh

I have also changed the data to RAW text & then imported it into excel, saved it as a CSV & it has produced a different result... The code now stops at "Inbound Tel"; Please note, below ERROR is now at "Inbound Tel" Huh

Traceback (most recent call last): 
  File "C:/Users/username/PycharmProjects/pythonProject/ManipulateCSVDataNRemovesErrors.py", line 17, in <module>
    df['Inbound Tel'] = df['Inbound Tel'].str.replace(" ", "")
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\generic.py", line 5126, in __getattr__
    return object.__getattribute__(self, name)
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\accessor.py", line 187, in __get__
    accessor_obj = self._accessor(obj)
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\strings.py", line 2100, in __init__
    self._inferred_dtype = self._validate(data)
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\strings.py", line 2157, in _validate
    raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!

I've arrived at the conclusion that I need to format the CSV file to the correct formatting style before running the code I've created, I'm not sure where to start though - possibly "Styler.apply", but I've also now be left in limbo due to the above errors & all three columns being set to general...

Any help or advise on what the heck is going on would be would be much appreciated!

Look forward to your response Smile

Kind Regards,
Richard

eddywinch82 · (This post was last modified: Sep-10-2020, 07:02 PM by eddywinch82.)

Hi traibr,

Try the following, I hope they work for you :-

df['Inbound Tel'] = df['Inbound Tel'].str.replace({" ": ""})

Or

df['Inbound Tel'] = df['Inbound Tel'].replace({" ": ""})

And

df['Text6'] = df['Text6'].str.replace({" ": ""})

Or

df['Text6'] = df['Text6'].replace({" ": ""})

Or insert the Line :-

df.columns = ['Inbound Tel', 'Text6']

Above these Lines in your Code :-

df['Inbound Tel'] = df['Inbound Tel'].str.replace(" ", "")
df['Text11'] = df['Text11'].str.replace(".", "")
df['Text6'] = df['Text6'].str.replace(" ", "")

Hash out the Inbound Tel, and Text6 Lines of the above lines of Code, and use the following line of Code instead,
in place of the hashed out lines of Code :-

df.columns = df.columns.str.replace(" ", "")

Or

df.columns = df.columns.str.replace(' ', '')

I hope some of those work for you.

Best Regards

Eddie ))

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[Solved] Formatting cells of a pandas dataframe into an OpenDocument ods spreadsheet	Calab	1	736	Mar-01-2025, 04:51 AM Last Post: Calab
	Find duplicates in a pandas dataframe list column on other rows	Calab	2	2,268	Sep-18-2024, 07:38 PM Last Post: Calab
	Find strings by index from a list of indexes in a different Pandas dataframe column	Calab	3	1,662	Aug-26-2024, 04:52 PM Last Post: Calab
	Add NER output to pandas dataframe	dg3000	0	1,174	Apr-22-2024, 08:14 PM Last Post: dg3000
	HTML Decoder pandas dataframe column	mbrown009	3	2,729	Sep-29-2023, 05:56 PM Last Post: deanhystad
	Use pandas to obtain cartesian product between a dataframe of int and equations?	haihal	0	2,042	Jan-06-2023, 10:53 PM Last Post: haihal
	Pandas Dataframe Filtering based on rows	mvdlm	0	2,095	Apr-02-2022, 06:39 PM Last Post: mvdlm
	Pandas dataframe: calculate metrics by year	mcva	1	3,458	Mar-02-2022, 08:22 AM Last Post: mcva
	Pandas dataframe comparing	anto5	0	1,937	Jan-30-2022, 10:21 AM Last Post: anto5
	PANDAS: DataFrame \| Replace and others questions	moduki1	2	2,681	Jan-10-2022, 07:19 PM Last Post: moduki1

PANDAS: DataFrame | White Spaces & Special Character Removal

User Panel Messages

Announcements