Python Forum
How do I remove spurious "." from a string?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How do I remove spurious "." from a string?
#1
The goal is to remove erroneous . from a string, i.e. to go from this:

"I a.m so pl.ea.sed to me.et y.ou. I ho.pe .tha.t th.is is t.he st.a.rt of a lo.n.g fri.en.dsh.ip."

to this:

"I am so pleased to meet you. I hope that this is the start of a long friendship."

The current code is:

import re

#create function to find, evaluate, and remove random "." 
def dot_hunt(text):
    #find full stop "." marks and notate their relative location in the string
    full_stop_locations=[]
    for i in range(len(text)):
        lttr=text[i]
        if lttr == ".":
            full_stop_locations.append(i)
        else:
            continue
    print("Full stop locations are: " + str(full_stop_locations))
    text2 = text.replace('.', '')
    print(text2)
    
    for i in range(len(text2)):
        substring = text2[i-1:i+2]
        if re.match(r"^[a-z]\s[A-Z]+$", substring):
            re.sub(r"\\s", r".\\s", text2[i])
            print(substring)
        else:
            continue
    
    print(text2)


    
    
#create variable to contain the string of output text from the OCR
text = input("Please insert the output text from the OCR:  \n>")
#create a variable which makes a function call and receives the returned text
new_text = dot_hunt(text)
Thank you for the help in advance.
Reply
#2
Please put your code in python bbcode tags so the indentation can be preserved.

What is your question about the code? It seems to work for some cases.

My attempt would probably be to just remove any period that is followed by a non-whitespace character.
Reply
#3
(Apr-10-2022, 06:06 PM)bowlofred Wrote: Please put your code in python bbcode tags so the indentation can be preserved.

What is your question about the code? It seems to work for some cases.

My attempt would probably be to just remove any period that is followed by a non-whitespace character.

import re

#create function to find, evaluate, and remove random "." 
def dot_hunt(text):
    #find full stop "." marks and notate their relative location in the string
    full_stop_locations=[]
    for i in range(len(text)):
        lttr=text[i]
        if lttr == ".":
            full_stop_locations.append(i)
        else:
            continue
    print("Full stop locations are: " + str(full_stop_locations))
    text2 = text.replace('.', '')
    print(text2)
    
    for i in range(len(text2)):
        substring = text2[i-1:i+2]
        if re.match(r"^[a-z]\s[A-Z]+$", substring):
            re.sub(r"\\s", r".\\s", text2[i])
            print(substring)
        else:
            continue
    
    print(text2)


    
    
#create variable to contain the string of output text from the OCR
text = input("Please insert the output text from the OCR:  \n>")
#create a variable which makes a function call and receives the returned text
new_text = dot_hunt(text)
Does that work?

*EDIT* Updated original post.
Reply
#4
It feels like homework.

In Python never do this:

for i in range(len(text)):
        lttr=text[i]
        if lttr == ".":
            full_stop_locations.append(i)
        else:
            continue
Use this pattern:

for char in text:
   if char == '.':
       # do something
One idea how to approach this problem (this intentionally is not full solution):

>>> import string
>>> text =  'I a.m so pl.ea.sed to me.et y.ou. I ho.pe .tha.t th.is is t.he st.a.rt of a lo.n.g fri.en.dsh.ip.'
>>> ''.join('' if char == '.' and text[i+1] not in string.whitespace else char for i, char in enumerate(text[:-1]))
'I am so pleased to meet you. I hope that this is the start of a long friendship'
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#5
The tricky part is knowing where a sentence ends and not removing that full stop. The rest is easy!

If you have: "I a.m. John" without an intelligent analysis of the text, how will you know whether the sentence is ended with the sequence "full stop space capital letter"?

And if some ne'er-do-well starts the next sentence without a capital letter, because the word is a brand name or something?

Best get the bitch who wrote all those . and beat him or her.

Failing that, write in Chinese!
Reply
#6
I would use itertools.zip_longest(text, text[1:], fillvalue=" "). This would give me letter pairs a, b where b is the letter immediately following a. If a is a period I can look at b to decide if it should be included in the new string or not. This can be done with just one line of Python code and still be readable. If you are using much more than 5 lines of code you are making the problem too hard. Think simple.
Reply
#7
Hello,
you can do this:
>>> text =  'I a.m so pl.ea.sed to me.et y.ou. I ho.pe .tha.t th.is is t.he st.a.rt of a lo.n.g fri.en.dsh.ip.'
>>> ". ".join(sentence.replace(".", "") for sentence in text.split(". "))
'I am so pleased to meet you. I hope that this is the start of a long friendship'
>>> 
I speak Python but I don't speak English (I just read it a little). If I express myself badly, please blame the translator^^.
Reply
#8
"Sentence" has no definition in English Grammar. "sentence" is a term borrowed from Propositional Logic.

Very hard to get Python to know where the sentence ends. People don't even know how they know that!

Quote:>>> text = 'I. a.m. so pl.ea.sed to me.et y.ou I ho.pe .tha.t. th.is is t.he st.a.rt of a lo.n.g fri.en.dsh.ip. Ur'

>>> ". ".join(sentence.replace(".", "") for sentence in text.split(". "))
'I. am. so pleased to meet you I hope that. this is the start of a long friendship. Ur'
>>>
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  remove gilberishs from a "string" kucingkembar 2 259 Mar-15-2024, 08:51 AM
Last Post: kucingkembar
Smile please help me remove error for string.strip() jamie_01 3 1,194 Oct-14-2022, 07:48 AM
Last Post: Pedroski55
  Remove a space between a string and variable in print sie 5 1,776 Jul-27-2022, 02:36 PM
Last Post: deanhystad
  How to remove char from string?? ridgerunnersjw 2 2,534 Sep-30-2020, 03:49 PM
Last Post: ridgerunnersjw
  Remove from end of string up to and including some character lbtdne 2 2,325 May-17-2020, 09:24 AM
Last Post: menator01
  Remove escape characters / Unicode characters from string DreamingInsanity 5 13,670 May-15-2020, 01:37 PM
Last Post: snippsat
  Highlight and remove specific string of text itsalmade 5 3,515 Dec-11-2019, 11:58 PM
Last Post: micseydel
  Cannot Remove the Double Quotes on a Certain Word (String) Python BeautifulSoup soothsayerpg 5 7,073 Oct-27-2019, 09:53 AM
Last Post: newbieAuggie2019
  with input remove a string from the list konsular 3 2,571 Oct-12-2019, 09:25 AM
Last Post: konsular
  remove string character from url jacklee26 10 6,094 Mar-25-2019, 03:56 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020