Python Forum

Hello. I'm really not sure about the defined re pattern for this exercise, especially how to define the string words between == on both sides in order to remove it as a whole header. Could you please provide me with any tips if you happen to know that. Thank you! Angel

My exercise:

Wikipedia uses two or more equal signs == to mark headers and subheaders in the articles (e.g. ==, ===, ====).

In all cases, the equal signs and the actual header text are separated by spaces on both sides, e.g. == History == or === Further reading ===.

Import the re module and define a regular expression that removes all headers and subheaders from the articles. Store this regular expression under a variable named pattern.

Apply the regular expression under pattern to each article (string object) in the list wiki_articles. Store each processed article into a new list named cleaned_articles.

My answer:

import re
cleaned_articles=[]
for string_object in wiki_articles:
    pattern = re.compile(r'={2,}.+')     #not sure about the defined pattern
    processed = pattern.sub(repl='', string=string_object)
    cleaned_articles.append(processed)

You most make test strings to see what happens,and no loop before have test this first.

import re

string_object = '''\
== History ==
=== Further reading ===
My car is blue
2 + 2 = 4
++= & hello='''

pattern = re.compile(r'={2,}.+')
processed = pattern.sub(repl='', string=string_object)
print(processed.strip())

Output:My car is blue
2 + 2 = 4
++= & hello=

So your regex should work fine and added .strip() to remove the new line that sub leave.

xinyulon

snippsat