Python Forum
clean unicode string to contain only characters from some unicode blocks
Thread Rating:
  • 1 Vote(s) - 1 Average
  • 1
  • 2
  • 3
  • 4
  • 5
clean unicode string to contain only characters from some unicode blocks
#3
I think you could use the re module for a better performance
regex = re.compile(ur"[^\x00-\xff]+")
clean_string = regex.sub(u"", string.decode('utf8')).encode('utf8')
Why use python 2?
Reply


Messages In This Thread
RE: clean unicode string to contain only characters from some unicode blocks - by Gribouillis - Nov-23-2018, 09:17 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  RSA Cipher with blocks Paragoon2 0 493 Nov-26-2023, 04:35 PM
Last Post: Paragoon2
  Can i clean this code ? BSDevo 8 952 Oct-28-2023, 05:50 PM
Last Post: BSDevo
  doing string split with 2 or more split characters Skaperen 22 2,553 Aug-13-2023, 01:57 AM
Last Post: Skaperen
  How do I check if the first X characters of a string are numbers? FirstBornAlbratross 6 1,549 Apr-12-2023, 10:39 AM
Last Post: jefsummers
  How to properly format rows and columns in excel data from parsed .txt blocks jh67 7 1,885 Dec-12-2022, 08:22 PM
Last Post: jh67
  Clean Up Script rotw121 2 1,019 May-25-2022, 03:24 PM
Last Post: rotw121
  How to clean UART string Joni_Engr 4 2,501 Dec-03-2021, 05:58 PM
Last Post: deanhystad
Question [SOLVED] Delete specific characters from string lines EnfantNicolas 4 2,223 Oct-21-2021, 11:28 AM
Last Post: EnfantNicolas
  width of Unicode character Skaperen 6 2,741 Sep-27-2021, 12:41 AM
Last Post: Skaperen
  is this Unicode printable? Skaperen 2 1,448 Sep-23-2021, 01:25 AM
Last Post: Skaperen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020