Python Forum
PANDAS: DataFrame | White Spaces & Special Character Removal
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
PANDAS: DataFrame | White Spaces & Special Character Removal
#1
Hi All,

Just to give you an idea, I'm new to python & coding in general, Tongue , however I've been in IT for 14+ years...

I'm building an automated task to clean CSV data produced by one of our systems. Reason being, we use a automation application that sends out emails to customers & references that data to build the emails in question.

However the data produced by the system sometimes has white spaces or full stop's present in a number of the columns & rows produced Confused (No idea as to why??). I manually go through this data every morning & clean it - records ranging from 20,000 > 2000... Sad

I'm getting a little sick & tired of doing this... & I know there's always a smarter way to work; & I guess learning to code Big Grin is the next best thing, so this is win win situation, once i get the hang of it!

CSV Example - Layout

Letter | Our_Reference | Fullname | Contact Name | Mailing_Add1 | Mailing_Pcode | Client Name | Client Reference | InboundTel | Text11 | Text6
TempLa | 12168001 | Joe Blog | 07528945863 | 2 House | rg1 7pt | Client1 | 9613905571 | 0753422499 | DISCON | 0118954
TempLa | 12168002 | Joe Blog | 07528945863 | 2 House | rg1 7pt | Client2 | 9613905572 | 0753422499 | DISCON | 0 1 954
TempLa | 12168003 | Joe Blog | 07528945863 | 2 House | rg1 7pt | Client3 | 9613905573 | 0753 42 24 | DISCON | 0118954
TempLa | 12168004 | Joe Blog | 07528945863 | 2 House | rg1 7pt | Client4 | 9613905574 | 0753422499 | . | 0118954
TempLa | 12168005 | Joe Blog | 07528945863 | 2 House | rg1 7pt | Client5 | 9613905575 | 0753422499 | DISCON | 0118954

The ERROR I'm running into is below; Please note, this happens at column "Text6".

Traceback (most recent call last):
  File "C:/Users/username/PycharmProjects/pythonProject/ManipulateCSVDataNRemovesErrors.py", line 19, in <module>
    df['Text6'] = df['Text6'].str.replace(" ", "")
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\generic.py", line 5126, in __getattr__
    return object.__getattribute__(self, name)
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\accessor.py", line 187, in __get__
    accessor_obj = self._accessor(obj)
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\strings.py", line 2100, in __init__
    self._inferred_dtype = self._validate(data)
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\strings.py", line 2157, in _validate
    raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!
The code I've built is below; please be gentle, I know this is probably not the best way to arrange code or even design it.. Blush

import pandas as pd


data = pd.read_csv(r'EMAIL9th Sep 2020_000000.csv')
df = pd.DataFrame(data, columns=['Letter Name', 'Our_Reference', 'Debtors Fullname', 'Contact Name', 'Salutation',
                                 'Mailing_Add1', 'Mailing_Add2', 'Mailing_Add3', 'Mailing_Add4', 'Mailing_Add5',
                                 'Mailing_Pcode', 'Client Name', 'Client Reference', 'Collector', 'Currency',
                                 'Balance', 'Initial Payment', 'No. Of Payments', 'Amount Agreed', 'Payable',
                                 'Starting From', 'Final Payment', 'LG Principal', 'LG Interest', 'LG Court Costs',
                                 'LG Solictor Cost', 'LG Warrant Fee', 'LG Total Due', 'Creditor Name',
                                 'Letter Header', 'Collector', 'Debtor Email', 'Inbound Tel', 'Text11', 'Text12',
                                 'Text1', 'Text2', 'Text3', 'Text4', 'Text5', 'Text6', 'Text7', 'Text8', 'Text9',
                                 'Text10', 'Date1', 'Date2', 'Date3', 'Date4', 'Date5', 'Date6', 'Number1',
                                 'Number2', 'Number3', 'Number4'])


df['Inbound Tel'] = df['Inbound Tel'].str.replace(" ", "")
df['Text11'] = df['Text11'].str.replace(".", "")
df['Text6'] = df['Text6'].str.replace(" ", "")

df.to_csv(r'EMAIL9th Sep 2020_000000_Mod.CSV', index=False)
print(df)
Now I understand this error "AttributeError: Can only use .str accessor with string values!" or I think I do, sting's are variables that hold characters & an integer's are variables that hold numerical values.

The three specified columns "Inbound Tel", "Text11" & "Text6" are set to general formatting when opened in excel (however, this is always by default)... So my question is, why does the code work on column "Inbound Tel" & not "Text6"?

Ever article or solution on Google when it comes to white spaces has pointed me to the following;
df['ColunmName'] = df['ColunmName'].str.replace(" ", "")
.

I've also tried
df['ColunmName'] = df['ColunmName'].__int__.replace(" ", "")
, this was unsuccessful... The following does not work either
df['ColunmName'] = df['ColunmName'].replace(" ", "")
.

However the following does work
df['ColunmName'] = df['ColunmName'].replace(".", "")
on 'Text11' Huh

I have also changed the data to RAW text & then imported it into excel, saved it as a CSV & it has produced a different result... The code now stops at "Inbound Tel"; Please note, below ERROR is now at "Inbound Tel" Huh

Traceback (most recent call last): 
  File "C:/Users/username/PycharmProjects/pythonProject/ManipulateCSVDataNRemovesErrors.py", line 17, in <module>
    df['Inbound Tel'] = df['Inbound Tel'].str.replace(" ", "")
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\generic.py", line 5126, in __getattr__
    return object.__getattribute__(self, name)
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\accessor.py", line 187, in __get__
    accessor_obj = self._accessor(obj)
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\strings.py", line 2100, in __init__
    self._inferred_dtype = self._validate(data)
  File "C:\Users\username\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\strings.py", line 2157, in _validate
    raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!
Huh Cry

I've arrived at the conclusion that I need to format the CSV file to the correct formatting style before running the code I've created, I'm not sure where to start though - possibly "Styler.apply", but I've also now be left in limbo due to the above errors & all three columns being set to general...

Any help or advise on what the heck is going on would be would be much appreciated!

Look forward to your response Smile

Kind Regards,
Richard
Reply
#2
Hi traibr,

Try the following, I hope they work for you :-

df['Inbound Tel'] = df['Inbound Tel'].str.replace({" ": ""})
Or

df['Inbound Tel'] = df['Inbound Tel'].replace({" ": ""})
And

df['Text6'] = df['Text6'].str.replace({" ": ""})
Or

df['Text6'] = df['Text6'].replace({" ": ""})
Or insert the Line :-

df.columns = ['Inbound Tel', 'Text6']
Above these Lines in your Code :-

df['Inbound Tel'] = df['Inbound Tel'].str.replace(" ", "")
df['Text11'] = df['Text11'].str.replace(".", "")
df['Text6'] = df['Text6'].str.replace(" ", "")
Hash out the Inbound Tel, and Text6 Lines of the above lines of Code, and use the following line of Code instead,
in place of the hashed out lines of Code :-

df.columns = df.columns.str.replace(" ", "")
Or

df.columns = df.columns.str.replace(' ', '')
I hope some of those work for you.

Best Regards

Eddie ))
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  No Output In Pandas DataFrame Query eddywinch82 1 249 Aug-17-2020, 09:25 PM
Last Post: eddywinch82
  strange error from pandas dataframe djf123 1 669 Jul-27-2020, 05:25 AM
Last Post: scidam
  Pandas DataFrame not updating HelpMePlease 3 450 Jul-11-2020, 07:19 PM
Last Post: jefsummers
  Pandas DataFrame visual Truman 8 592 Jul-10-2020, 06:11 AM
Last Post: hussainmujtaba
  Pandas DataFrame and unmatched column sritsv19 0 431 Jul-07-2020, 12:52 PM
Last Post: sritsv19
  Pandas DataFrame Concatenate problems Kristenl2784 1 353 Jul-01-2020, 01:28 AM
Last Post: hussainmujtaba
  Difference of two columns in Pandas dataframe zinho 2 661 Jun-17-2020, 03:36 PM
Last Post: zinho
  error bars with dataframe and pandas Hucky 4 622 Apr-27-2020, 02:02 AM
Last Post: Hucky
  Python Pandas DataFrame Help AmericanEagle1989 1 410 Apr-12-2020, 12:37 PM
Last Post: AmericanEagle1989
  How does pyplot know what was plotted by the output of pandas.DataFrame(...).cumprod( codeowl 2 427 Mar-28-2020, 08:27 AM
Last Post: j.crater

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020