Python Forum
Need to replace (remove) Unicode characters in text
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need to replace (remove) Unicode characters in text
#1
Before posting here I resarched the subject of unicode replace, but got nowhere. I am using Python 3 version of Autokey, with which I want to run a script to clean up scanned text. The replacement character (U+FFFD) is scattered all over the document and want to replace them with "" (empty string). Please see sample text below the code. Can someone in the know suggest what I am missing?

my attempt
ctrla = "<ctrl>+a"
ctrlc  = "<ctrl>+c"
keyboard.send_keys(ctrla + ctrlc) 
page = clipboard.get_selection()
char = u"\uFFFD"
page.replace(char,"")
ctrlv = "<ctrl>+v"
keyboard.send_keys(ctrlv) #test code to see contents of clipboard.
sample text
Quote:��To summarize: Man is evolving and in that evolution he has lost
some physical traits and gained some mental ones. But neither in
their losses nor in their gains have all strains evolved to the same extent.
Some races have lost the skin pigment, but others have made little
progress in this direction. We are getting rid of our body coat of
hair, but the Akkas of the Upper Nile and special smaller strains have
a very hairy body, and so appendix and tail (coccyx) show variations
that run in families. Likewise in the acquisition of mental traits,
whole races differ in their ability to speak, to count, to foresee. The
Ethiopian has no more need for thrift than the tropical monkey and
has not acquired it. It is not surprising that there are strains, even

��
Reply
#2
The line
page.replace(char,"")
doesn't do what you seem to think. It takes the string, page generates a new one with the replacement completed, but then nothing happens with that generated string. You could update the contents of the variable by putting page = at the start of the line, but that still wouldn't fully accomplish what you want. After you capture the result of the replacement, you'd then need to do a clipboard.set_selection(updated_page) to mirror the get. (I don't see a definition of that variable and I'm not familiar with Autokey but I imagine an API exists for what you want.)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  replace text in a txt cartonics 19 2,076 Jan-30-2024, 06:58 AM
Last Post: Athi
  Replace a text/word in docx file using Python Devan 4 2,858 Oct-17-2023, 06:03 PM
Last Post: Devan
  How to remove footer from PDF when extracting to text jh67 3 4,861 Dec-13-2022, 06:52 AM
Last Post: DPaul
  How to remove patterns of characters from text aaander 4 1,084 Nov-19-2022, 03:34 PM
Last Post: snippsat
  python-docx regex: replace any word in docx text Tmagpy 4 2,140 Jun-18-2022, 09:12 AM
Last Post: Tmagpy
  remove all color but red, then replace it with black kucingkembar 14 6,920 Dec-29-2021, 07:50 PM
Last Post: deanhystad
  Replace String in multiple text-files [SOLVED] AlphaInc 5 7,963 Aug-08-2021, 04:59 PM
Last Post: Axel_Erfurt
  Regex not finding all unicode characters tantony 3 2,237 Jul-13-2021, 09:11 PM
Last Post: tantony
  Want to remove the text from a particular column in excel shantanu97 2 2,094 Jul-05-2021, 05:42 PM
Last Post: eddywinch82
  More elegant way to remove time from text lines. Pedroski55 6 3,840 Apr-25-2021, 03:18 PM
Last Post: perfringo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020