Python Forum
clean unicode string to contain only characters from some unicode blocks
Thread Rating:
  • 1 Vote(s) - 1 Average
  • 1
  • 2
  • 3
  • 4
  • 5
clean unicode string to contain only characters from some unicode blocks
#1
Hi,

I have a unicode string and I need to remove all characters that are not part of the Latin-1 and Latin-1 Supplement Unicode block.

The only way I could get it works is the following:

#0000..007F; Basic Latin
#0080..00FF; Latin-1 Supplement
allowed_chars = map(lambda x: unichr(x).encode('utf-8'), range(0,255))
clean_string = ''.join(char.encode('utf-8') for char in unicode(string,'utf-8') if char.encode('utf-8') in allowed_chars)

Is there a better way? (better = code clearer to read, code efficient, ...)

Thank you for your precious support and regards,
Giulio
Reply


Messages In This Thread
clean unicode string to contain only characters from some unicode blocks - by gmarcon - Nov-22-2018, 06:24 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  RSA Cipher with blocks Paragoon2 0 480 Nov-26-2023, 04:35 PM
Last Post: Paragoon2
  Can i clean this code ? BSDevo 8 928 Oct-28-2023, 05:50 PM
Last Post: BSDevo
  doing string split with 2 or more split characters Skaperen 22 2,472 Aug-13-2023, 01:57 AM
Last Post: Skaperen
  How do I check if the first X characters of a string are numbers? FirstBornAlbratross 6 1,507 Apr-12-2023, 10:39 AM
Last Post: jefsummers
  How to properly format rows and columns in excel data from parsed .txt blocks jh67 7 1,866 Dec-12-2022, 08:22 PM
Last Post: jh67
  Clean Up Script rotw121 2 1,003 May-25-2022, 03:24 PM
Last Post: rotw121
  How to clean UART string Joni_Engr 4 2,466 Dec-03-2021, 05:58 PM
Last Post: deanhystad
Question [SOLVED] Delete specific characters from string lines EnfantNicolas 4 2,192 Oct-21-2021, 11:28 AM
Last Post: EnfantNicolas
  width of Unicode character Skaperen 6 2,690 Sep-27-2021, 12:41 AM
Last Post: Skaperen
  is this Unicode printable? Skaperen 2 1,433 Sep-23-2021, 01:25 AM
Last Post: Skaperen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020