Jul-04-2021, 05:18 AM
So, I have an html file with this 4 html tags:
My code must find and translate only those tags that contains at least 3 of the keywords I put in the Regex. In the example above, in the meta description tag, there are 3 keywords that also are in the regex formula: bebe|oana|mother. The first 3 regex works, I test them, but only the 4 regex is skip by Python. I don't know why, but I believe is because the formula regex must start and end with the same string. For example, in title tag, regex starts with <title> and ends with </title> .
But my meta descrition tag, in the regex formula, starts with <meta...and ends with > if it had all ended with meta it would have worked, but cannot end with
Quote:<p class="text_obisnuit">Can you provide a little more information on the problem you're trying to solve? Are you iterating through tags programatically?</p>,
<p class="text_obisnuit2">At the end of the day, use the most appropriate tool for the job, even in the cases when that tool happens to be a regex.</p>,
<title>It's true that when programming it's usually best to use dedicated parsers</title>
<meta name="description" content=" I only wrote bebe my class when the XML parsers proved unable to withstand real oana use. Religious downvoting just prevents useful answers from being posted - keep things within mother perspective of the question, please."/>
My code must find and translate only those tags that contains at least 3 of the keywords I put in the Regex. In the example above, in the meta description tag, there are 3 keywords that also are in the regex formula: bebe|oana|mother. The first 3 regex works, I test them, but only the 4 regex is skip by Python. I don't know why, but I believe is because the formula regex must start and end with the same string. For example, in title tag, regex starts with <title> and ends with </title> .
But my meta descrition tag, in the regex formula, starts with <meta...and ends with > if it had all ended with meta it would have worked, but cannot end with
from bs4 import BeautifulSoup from bs4.formatter import HTMLFormatter from googletrans import Translator import requests import re translator = Translator() class UnsortedAttributes(HTMLFormatter): def attributes(self, tag): for k, v in tag.attrs.items(): yield k, v files_from_folder = r"e:\Folder3" use_translate_folder = False destination_language = 'af' extension_file = ".html" pattern1 = r'<p class="text_obisnuit">.*((bebe|oana|mother|sun).*){3,}.*</p>' pattern2 = r'<p class="text_obisnuit2">.*((bebe|oana|mother|sun).*){3,}.*</p>' pattern3 = r'<title>.*((bebe|oana|mother|sun).*){3,}.*</title>' pattern4 = r'<meta name="description" content=.*((bebe|oana|mother|sun).*){3,}.*>' patterns = [pattern1, pattern2, pattern3, pattern4] import os