Python Forum
Need to replace (remove) Unicode characters in text - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Need to replace (remove) Unicode characters in text (/thread-7282.html)



Need to replace (remove) Unicode characters in text - ineuw - Jan-02-2018

Before posting here I resarched the subject of unicode replace, but got nowhere. I am using Python 3 version of Autokey, with which I want to run a script to clean up scanned text. The replacement character (U+FFFD) is scattered all over the document and want to replace them with "" (empty string). Please see sample text below the code. Can someone in the know suggest what I am missing?

my attempt
ctrla = "<ctrl>+a"
ctrlc  = "<ctrl>+c"
keyboard.send_keys(ctrla + ctrlc) 
page = clipboard.get_selection()
char = u"\uFFFD"
page.replace(char,"")
ctrlv = "<ctrl>+v"
keyboard.send_keys(ctrlv) #test code to see contents of clipboard.
sample text
Quote:��To summarize: Man is evolving and in that evolution he has lost
some physical traits and gained some mental ones. But neither in
their losses nor in their gains have all strains evolved to the same extent.
Some races have lost the skin pigment, but others have made little
progress in this direction. We are getting rid of our body coat of
hair, but the Akkas of the Upper Nile and special smaller strains have
a very hairy body, and so appendix and tail (coccyx) show variations
that run in families. Likewise in the acquisition of mental traits,
whole races differ in their ability to speak, to count, to foresee. The
Ethiopian has no more need for thrift than the tropical monkey and
has not acquired it. It is not surprising that there are strains, even

��



RE: Need to replace (remove) Unicode characters in text - micseydel - Jan-02-2018

The line
page.replace(char,"")
doesn't do what you seem to think. It takes the string, page generates a new one with the replacement completed, but then nothing happens with that generated string. You could update the contents of the variable by putting page = at the start of the line, but that still wouldn't fully accomplish what you want. After you capture the result of the replacement, you'd then need to do a clipboard.set_selection(updated_page) to mirror the get. (I don't see a definition of that variable and I'm not familiar with Autokey but I imagine an API exists for what you want.)