Nov-22-2018, 06:24 PM
Hi,
I have a unicode string and I need to remove all characters that are not part of the Latin-1 and Latin-1 Supplement Unicode block.
The only way I could get it works is the following:
#0000..007F; Basic Latin
#0080..00FF; Latin-1 Supplement
allowed_chars = map(lambda x: unichr(x).encode('utf-8'), range(0,255))
clean_string = ''.join(char.encode('utf-8') for char in unicode(string,'utf-8') if char.encode('utf-8') in allowed_chars)
Is there a better way? (better = code clearer to read, code efficient, ...)
Thank you for your precious support and regards,
Giulio
I have a unicode string and I need to remove all characters that are not part of the Latin-1 and Latin-1 Supplement Unicode block.
The only way I could get it works is the following:
#0000..007F; Basic Latin
#0080..00FF; Latin-1 Supplement
allowed_chars = map(lambda x: unichr(x).encode('utf-8'), range(0,255))
clean_string = ''.join(char.encode('utf-8') for char in unicode(string,'utf-8') if char.encode('utf-8') in allowed_chars)
Is there a better way? (better = code clearer to read, code efficient, ...)
Thank you for your precious support and regards,
Giulio