Python Forum
How to Remove Non-ASCII Characters But Leave Line Breaks In Place?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to Remove Non-ASCII Characters But Leave Line Breaks In Place?
#1
I have a function in a Python script that serves to remove non-ASCII characters from strings before these strings are ultimately saved to an Oracle database.

		
# This should remove any ASCII characters between 0-31 and also ones 127 & up.
     sCleanedString = re.sub(r'[^\x20-\x7E]',r'', sStringToClean)
When I pass in a large string that's the full and complete content of an entire email message to clean, it's stripping out the line break characters and leaving me with a cleaned string that's all jumbled up with no line breaks. For this special case, I'd like to clean the string but leave the line break characters.

Any suggestions on how to modify the above Python string to do what needs to be done?

Thanks!
Reply
#2
Don't use regex for this.

def strip_ascii(text):
    return "".join(
        char for char
        in text
        if 31 < ord(char) < 127
    )
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
sCleanedString = re.sub(r'[^\x0A,\x20-\x7E]',r'*', typestr)
Reply
#4
Many thanks for the couple of recommendations on how to address the non-ASCII character situation without removing line break characters!
Reply
#5
def strip_ascii(text):
    return "".join(
        char for char
        in text
        if 31 < ord(char) < 127 or char in "\n\r"
    )
Output:
In [21]: print(strip_ascii(""" ...: ...: Das ist ein Test ...: ^^^^ ...: ================ ...: **************** ...: """)) Das ist ein Test ^^^^ ================ ****************
This can be optimized.
31 < ord(char) < 127 or char in "\n\r"
Edit: Second example
Output:
In [24]: print(strip_ascii(""" ...: ...: ääääääääöööööDas ist ein Testööüüüüüüüü ...: ^^^^ßßßßß°°°°°°° ...: =°=°=°=°=°=°=°=°=°=°=°=°=°=°=°= ...: ****************??? ...: """)) Das ist ein Test ^^^^ ================ ****************
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  place 2 windows exactly above each other janeik 3 882 Jul-23-2023, 03:12 AM
Last Post: deanhystad
  Tie Breaks klatlap 6 1,051 Mar-20-2023, 12:08 PM
Last Post: klatlap
  How to remove patterns of characters from text aaander 4 1,084 Nov-19-2022, 03:34 PM
Last Post: snippsat
  Putting code into a function breaks its functionality, though the code is identical! PCesarano 1 1,949 Apr-05-2021, 05:40 PM
Last Post: deanhystad
  Cannot 'break' from a "for" loop in a right place tester_V 9 3,890 Feb-17-2021, 01:03 AM
Last Post: tester_V
  Rename Multiple files in directory to remove special characters nyawadasi 9 6,236 Feb-16-2021, 09:49 PM
Last Post: BashBedlam
  Read characters of line and return positions Gizzmo28 2 1,979 Nov-04-2020, 09:27 AM
Last Post: perfringo
  Print characters in a single line rather than one at a time hhydration 1 1,992 Oct-10-2020, 10:00 PM
Last Post: bowlofred
  Remove escape characters / Unicode characters from string DreamingInsanity 5 13,428 May-15-2020, 01:37 PM
Last Post: snippsat
  Where should I place GPIO.cleanup() shallanq 2 2,111 Apr-11-2020, 05:02 AM
Last Post: shallanq

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020