Nov-19-2019, 12:10 PM
Hi,
I have a corpus which contains sentences in a certain pattern, which I would like to change by applying Regex.
The pattern is [a certain set of words][word1], [a certain set of words][word2],[a certain set of words][word3], etc
to be converted to: [a certain set of words][word1], [word2] or [word3]
A few examples:
So whether it is the body, whether it is the mind, whether it is the energy, whether it is the emotions.
Changes to:
So whether it is the body, mind, energy or emotions.
So whether it is the body, whether it is the mind, whether it is the energy.
Changes to:
So whether it is the body, mind or energy.
So he cannot eat, he cannot sleep.
Changes to:
So he cannot eat or sleep.
The Regex I'm using for each of the sentences are
fulltext = re.sub(r"((\b\S+\b\s){1,4})(\b\S+\b)[,]\s\1(\b\S+\b)[,]\s\1(\b\S+\b)[,]\s\1", r"\1\3, \4, \5 or ", fulltext)
fulltext = re.sub(r"((\b\S+\b\s){1,4})(\b\S+\b)[,]\s\1(\b\S+\b)[,]\s\1", r"\1\3, \4 or ", fulltext)
fulltext = re.sub(r"((\b\S+\b\s){1,4})(\b\S+\b)[,]\s\1", r"\1\3 or ", fulltext)
Was wondering if there is a single regex I can apply to all of these, and also a more general case where the pattern repeats any number of times.
I have a corpus which contains sentences in a certain pattern, which I would like to change by applying Regex.
The pattern is [a certain set of words][word1], [a certain set of words][word2],[a certain set of words][word3], etc
to be converted to: [a certain set of words][word1], [word2] or [word3]
A few examples:
So whether it is the body, whether it is the mind, whether it is the energy, whether it is the emotions.
Changes to:
So whether it is the body, mind, energy or emotions.
So whether it is the body, whether it is the mind, whether it is the energy.
Changes to:
So whether it is the body, mind or energy.
So he cannot eat, he cannot sleep.
Changes to:
So he cannot eat or sleep.
The Regex I'm using for each of the sentences are
fulltext = re.sub(r"((\b\S+\b\s){1,4})(\b\S+\b)[,]\s\1(\b\S+\b)[,]\s\1(\b\S+\b)[,]\s\1", r"\1\3, \4, \5 or ", fulltext)
fulltext = re.sub(r"((\b\S+\b\s){1,4})(\b\S+\b)[,]\s\1(\b\S+\b)[,]\s\1", r"\1\3, \4 or ", fulltext)
fulltext = re.sub(r"((\b\S+\b\s){1,4})(\b\S+\b)[,]\s\1", r"\1\3 or ", fulltext)
Was wondering if there is a single regex I can apply to all of these, and also a more general case where the pattern repeats any number of times.