The goal is to remove erroneous . from a string, i.e. to go from this:
"I a.m so pl.ea.sed to me.et y.ou. I ho.pe .tha.t th.is is t.he st.a.rt of a lo.n.g fri.en.dsh.ip."
to this:
"I am so pleased to meet you. I hope that this is the start of a long friendship."
The current code is:
"I a.m so pl.ea.sed to me.et y.ou. I ho.pe .tha.t th.is is t.he st.a.rt of a lo.n.g fri.en.dsh.ip."
to this:
"I am so pleased to meet you. I hope that this is the start of a long friendship."
The current code is:
import re #create function to find, evaluate, and remove random "." def dot_hunt(text): #find full stop "." marks and notate their relative location in the string full_stop_locations=[] for i in range(len(text)): lttr=text[i] if lttr == ".": full_stop_locations.append(i) else: continue print("Full stop locations are: " + str(full_stop_locations)) text2 = text.replace('.', '') print(text2) for i in range(len(text2)): substring = text2[i-1:i+2] if re.match(r"^[a-z]\s[A-Z]+$", substring): re.sub(r"\\s", r".\\s", text2[i]) print(substring) else: continue print(text2) #create variable to contain the string of output text from the OCR text = input("Please insert the output text from the OCR: \n>") #create a variable which makes a function call and receives the returned text new_text = dot_hunt(text)Thank you for the help in advance.