Python Forum

Full Version: Regular Expression for matching words
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello. I'm really not sure about the defined re pattern for this exercise, especially how to define the string words between == on both sides in order to remove it as a whole header. Could you please provide me with any tips if you happen to know that. Thank you! Angel

My exercise:

Wikipedia uses two or more equal signs == to mark headers and subheaders in the articles (e.g. ==, ===, ====).

In all cases, the equal signs and the actual header text are separated by spaces on both sides, e.g. == History == or === Further reading ===.

Import the re module and define a regular expression that removes all headers and subheaders from the articles. Store this regular expression under a variable named pattern.

Apply the regular expression under pattern to each article (string object) in the list wiki_articles. Store each processed article into a new list named cleaned_articles.


My answer:

import re
cleaned_articles=[]
for string_object in wiki_articles:
    pattern = re.compile(r'={2,}.+')     #not sure about the defined pattern
    processed = pattern.sub(repl='', string=string_object)
    cleaned_articles.append(processed)
You most make test strings to see what happens,and no loop before have test this first.
import re

string_object = '''\
== History ==
=== Further reading ===
My car is blue
2 + 2 = 4
++= & hello='''

pattern = re.compile(r'={2,}.+')
processed = pattern.sub(repl='', string=string_object)
print(processed.strip())
Output:
My car is blue 2 + 2 = 4 ++= & hello=
So your regex should work fine and added .strip() to remove the new line that sub leave.