Posts: 35
Threads: 12
Joined: Apr 2019
Apr-26-2019, 06:18 AM
(This post was last modified: Apr-26-2019, 06:18 AM by stahorse.)
Hi,
I'm trying to identify two specific words next to each other and remove them from a string. I though of using REPLACE, but this can work on one word.
E.G I would receive text like this
Quote:message = '''
Good Morning
We need your input please.
Vriendelike groete/ Kind regards
Badu Thusong
Direct tel: 021 974 7313 | Email:
[email protected]
Dear Boss
How are you
today
Your number
Branch Agency: Meme
Branch Agency Code: 0329271
Thank you for contacting us
Kind regards
Agriculture Contact Centre
so I want to go through it, whenever I find Quote:Kind regards
or Quote:Vriendelike groete/ Kind regards
I want to remove it from the message.
Any advice?
Posts: 1,950
Threads: 8
Joined: Jun 2018
Apr-26-2019, 06:35 AM
(This post was last modified: Apr-26-2019, 06:35 AM by perfringo.)
Whats wrong with str.replace?
In [1]: for phrase in ['Kind regards', 'Vriendelike groete/ Kind regards']:
...: message = message.replace(phrase, '')
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 35
Threads: 12
Joined: Apr 2019
It removes "Kind Regards" in the first email in the text, everything below still picks "Kind Regards"
Posts: 1,950
Threads: 8
Joined: Jun 2018
My mistake - the order should be reversed, otherwise 'Kind regards' will be replaced on first loop and second phrase will not be matched (as 'Kind regards' part is already removed).
In [1]: message = '\nGood Morning\n\nWe need your input please.\n\n\nVriendelike groete/ Kind regards\n\nBadu Thusong\n\nDirect tel: 021 974 7313 | Email:\nBadu@thus
...: ong.com\n\n\nDear Boss\n\n\nHow are you\n\ntoday\n\nYour number\n\nBranch Agency: Meme\n\nBranch Agency Code: 0329271\n\nThank you for contacting us\n\n\nKin
...: d regards\n\nAgriculture Contact Centre'
In [2]: for phrase in ['Vriendelike groete/ Kind regards', 'Kind regards']:
...: message = message.replace(phrase, '')
...:
In [3]: message
Out[3]: '\nGood Morning\n\nWe need your input please.\n\n\n\n\nBadu Thusong\n\nDirect tel: 021 974 7313 | Email:\[email protected]\n\n\nDear Boss\n\n\nHow are you\n\ntoday\n\nYour number\n\nBranch Agency: Meme\n\nBranch Agency Code: 0329271\n\nThank you for contacting us\n\n\n\n\nAgriculture Contact Centre'
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 24
Threads: 11
Joined: Apr 2019
Apr-26-2019, 07:40 AM
(This post was last modified: Apr-26-2019, 07:40 AM by NewBeie.)
I also have the same problem.
Posts: 35
Threads: 12
Joined: Apr 2019
My code doesn't recognize "Kind Regards" at all on it's own, it only removes it when it is next to "Vriendelike groete" :
import re
from bs4 import BeautifulSoup
import string
message = '''
Dear Sir
What do you think
Kind regards
Badu Thusong
Direct tel: 021 974 7313 | Email:
[email protected]
Good Morning
We need your input please.
Vriendelike groete/ Kind regards
Badu Thusong
Direct tel: 021 974 7313 | Email:
[email protected]
Dear Boss
How are you
today
Your number
Branch Agency: Meme
Branch Agency Code: 0329271
Thank you for contacting us
Kind regards
Agriculture Contact Centre
'''
for phrase in ['Vriendelike groete/Kind regards', 'Vriendelike groete/ Kind regards']:
text = message.replace(phrase, '')
print(text) Output:
Output: Dear Sir
What do you think
Kind regards
Badu Thusong
Direct tel: 021 974 7313 | Email:
[email protected]
Good Morning
We need your input please.
Badu Thusong
Direct tel: 021 974 7313 | Email:
[email protected]
Dear Boss
How are you
today
Your number
Branch Agency: Meme
Branch Agency Code: 0329271
Thank you for contacting us
Kind regards
Agriculture Contact Centre
(Apr-26-2019, 07:08 AM)perfringo Wrote: My mistake - the order should be reversed, otherwise 'Kind regards' will be replaced on first loop and second phrase will not be matched (as 'Kind regards' part is already removed).
In [1]: message = '\nGood Morning\n\nWe need your input please.\n\n\nVriendelike groete/ Kind regards\n\nBadu Thusong\n\nDirect tel: 021 974 7313 | Email:\nBadu@thus
...: ong.com\n\n\nDear Boss\n\n\nHow are you\n\ntoday\n\nYour number\n\nBranch Agency: Meme\n\nBranch Agency Code: 0329271\n\nThank you for contacting us\n\n\nKin
...: d regards\n\nAgriculture Contact Centre'
In [2]: for phrase in ['Vriendelike groete/ Kind regards', 'Kind regards']:
...: message = message.replace(phrase, '')
...:
In [3]: message
Out[3]: '\nGood Morning\n\nWe need your input please.\n\n\n\n\nBadu Thusong\n\nDirect tel: 021 974 7313 | Email:\[email protected]\n\n\nDear Boss\n\n\nHow are you\n\ntoday\n\nYour number\n\nBranch Agency: Meme\n\nBranch Agency Code: 0329271\n\nThank you for contacting us\n\n\n\n\nAgriculture Contact Centre'
Posts: 2,128
Threads: 11
Joined: May 2017
Apr-26-2019, 08:08 AM
(This post was last modified: Apr-26-2019, 08:08 AM by DeaD_EyE.)
Doing the replacement with replace, you'll have issues with spelling/lowercase-uppercase.
You can use reular expressions for this task. (Don't use it to parse HTML).
Here a simple example:
andre@andre-GP70-2PE:~$ ipython
Python 3.7.3 (default, Apr 15 2019, 14:17:18)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.4.0 -- An enhanced Interactive Python. Type '?' for help.
...:
...: Vriendelike groete/ Kind regards
...:
...: Badu Thusong
...:
...: Direct tel: 021 974 7313 | Email:
...: [email protected]
...:
...:
...: Dear Boss
...:
...:
...: How are you
...:
...: today
...:
...: Your number
...:
...: Branch Agency: Meme
...:
...: Branch Agency Code: 0329271
...:
...: Thank you for contacting us
...:
...:
...: Kind regards
...:
...: Agriculture Contact Centre
...: '''
In [2]: import re
In [3]: re.sub(r'[kK]ind [rR]egards', 'Best greetings', message)
Out[3]: '\nDear Sir\n \nWhat do you think\n \n \nBest greetings\n \nBadu Thusong\n \nDirect tel: 021 974 7313 | Email:\[email protected]\n \nGood Morning\n \nWe need your input please.\n \n \nVriendelike groete/ Best greetings\n \nBadu Thusong\n \nDirect tel: 021 974 7313 | Email:\[email protected]\n \n \nDear Boss\n \n \nHow are you\n \ntoday\n \nYour number\n \nBranch Agency: Meme\n \nBranch Agency Code: 0329271\n \nThank you for contacting us\n \n \nBest greetings\n \nAgriculture Contact Centre\n'
In [4]: print(re.sub(r'[kK]ind [rR]egards', 'Best greetings', message))
Dear Sir
What do you think
Best greetings
Badu Thusong
Direct tel: 021 974 7313 | Email:
[email protected]
Good Morning
We need your input please.
Vriendelike groete/ Best greetings
Badu Thusong
Direct tel: 021 974 7313 | Email:
[email protected]
Dear Boss
How are you
today
Your number
Branch Agency: Meme
Branch Agency Code: 0329271
Thank you for contacting us
Best greetings
Agriculture Contact Centre The regex matches on: - kind regards
- kind Regards
- Kind regards
- Kind Regards
I always point to regex101.com, because there you can test your regex.
There are also offline tools to check a regex. The use of regex is not very easy at the beginning, but the more you use it, the more you like it for text processing.
But never forget: Regex is not for everything a good solution.
Posts: 1,950
Threads: 8
Joined: Jun 2018
(Apr-26-2019, 07:50 AM)stahorse Wrote: My code doesn't recognize "Kind Regards" at all on it's own, it only removes it when it is next to "Vriendelike groete"
It is expected behaviour, as in your code you replace only instances of 'Vriendelike groete/Kind regards' and 'Vriendelike groete/ Kind regards'. This code can't and shouldn't replace standalone phrase 'Kind Regards'.
I suspect that this is homework. If so .replace is probably what your teachers want you to learn. If it's real life scenario then go with solution provided by DeaD_EyE. However, it beats me why would someone need to replace some (insignificant) part of string for real. Usually one must retrieve not replace data from string. Resulting string is not any better for retrieving/parsing/structuring data it contains. But it just me
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 35
Threads: 12
Joined: Apr 2019
Apr-26-2019, 09:05 AM
(This post was last modified: Apr-26-2019, 09:07 AM by stahorse.)
lol I'm not a student, this is my 4th week learning Python. I've now upgraded to looking at other people's code here at work and I'm trying to play around it. Otherwise I'm a BI Developer, trying to learn a new skill.
(Apr-26-2019, 08:39 AM)perfringo Wrote: (Apr-26-2019, 07:50 AM)stahorse Wrote: My code doesn't recognize "Kind Regards" at all on it's own, it only removes it when it is next to "Vriendelike groete"
It is expected behaviour, as in your code you replace only instances of 'Vriendelike groete/Kind regards' and 'Vriendelike groete/ Kind regards'. This code can't and shouldn't replace standalone phrase 'Kind Regards'.
I suspect that this is homework. If so .replace is probably what your teachers want you to learn. If it's real life scenario then go with solution provided by DeaD_EyE. However, it beats me why would someone need to replace some (insignificant) part of string for real. Usually one must retrieve not replace data from string. Resulting string is not any better for retrieving/parsing/structuring data it contains. But it just me 
Posts: 1,950
Threads: 8
Joined: Jun 2018
(Apr-26-2019, 09:05 AM)stahorse Wrote: lol I'm not a student, this is my 4th week learning Python. I've now upgraded to looking at other people's code here at work and I'm trying to play around it. Otherwise I'm a BI Developer, trying to learn a new skill.
Enjoy your journey in wonderful world of Python!
My comment was on practical ground - only homeworks tend to accomplish something which is useless. And this is kind of homework you assigned to yourself
You can try 'practical' application as well:
if 'regards' in e_mail_body.lower():
print('Polite person')
else:
print('Impolite or foreigner')
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
|