Regex: a string does not starts and ends with the same character

Melcu54 · Jul-04-2021, 05:18 AM

So, I have an html file with this 4 html tags:

Quote:<p class="text_obisnuit">Can you provide a little more information on the problem you're trying to solve? Are you iterating through tags programatically?</p>,
<p class="text_obisnuit2">At the end of the day, use the most appropriate tool for the job, even in the cases when that tool happens to be a regex.</p>,
<title>It's true that when programming it's usually best to use dedicated parsers</title>
<meta name="description" content=" I only wrote bebe my class when the XML parsers proved unable to withstand real oana use. Religious downvoting just prevents useful answers from being posted - keep things within mother perspective of the question, please."/>

My code must find and translate only those tags that contains at least 3 of the keywords I put in the Regex. In the example above, in the meta description tag, there are 3 keywords that also are in the regex formula: bebe|oana|mother. The first 3 regex works, I test them, but only the 4 regex is skip by Python. I don't know why, but I believe is because the formula regex must start and end with the same string. For example, in title tag, regex starts with <title> and ends with </title> .

But my meta descrition tag, in the regex formula, starts with <meta...and ends with > if it had all ended with meta it would have worked, but cannot end with

from bs4 import BeautifulSoup
from bs4.formatter import HTMLFormatter
from googletrans import Translator
import requests
import re

translator = Translator()

class UnsortedAttributes(HTMLFormatter):
    def attributes(self, tag):
        for k, v in tag.attrs.items():
            yield k, v

files_from_folder = r"e:\Folder3"

use_translate_folder = False

destination_language = 'af'

extension_file = ".html"
pattern1 = r'<p class="text_obisnuit">.*((bebe|oana|mother|sun).*){3,}.*</p>'
pattern2 = r'<p class="text_obisnuit2">.*((bebe|oana|mother|sun).*){3,}.*</p>'
pattern3 = r'<title>.*((bebe|oana|mother|sun).*){3,}.*</title>'
pattern4 = r'<meta name="description" content=.*((bebe|oana|mother|sun).*){3,}.*>'

patterns = [pattern1, pattern2, pattern3, pattern4]
import os

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Move column to the right if it starts with a letter	mfernandes	0	716	Oct-25-2022, 11:22 AM Last Post: mfernandes
	Writing string to file results in one character per line	RB76SFJPsJJDu3bMnwYM	4	1,482	Sep-27-2022, 01:38 PM Last Post: buran
	pywin32: Outlook connection ends with 'operation aborted' on one machine	tstone	0	2,493	May-03-2022, 04:29 AM Last Post: tstone
	Setup Portable Python on Windows for script starts with double clicks?	pstein	0	1,876	Feb-18-2022, 01:29 PM Last Post: pstein
	[solved] unexpected character after line continuation character	paul18fr	4	3,578	Jun-22-2021, 03:22 PM Last Post: deanhystad
	threadlocals are garbage collected before thread ends	akv1597	0	1,851	Mar-09-2021, 12:13 PM Last Post: akv1597
	Running a few lines of code as soon as my timer ends	nethatar	3	2,490	Feb-26-2021, 01:02 PM Last Post: jefsummers
	'\|' character within Regex returns a tuple?	pprod	10	5,790	Feb-19-2021, 05:29 PM Last Post: eddywinch82
	Writing to file ends incorrectly	project_science	4	2,816	Jan-06-2021, 06:39 PM Last Post: bowlofred
	Help getting a string out of regex	matt_the_hall	4	2,353	Dec-02-2020, 01:49 AM Last Post: matt_the_hall

Regex: a string does not starts and ends with the same character

User Panel Messages

Announcements