Python Forum

Hello,

In the files I need to work with, I notice that the values in the following meta line can be either lower-case or capitalized.

What would be the right way to convert them to lower-case regardless so that search doesn't miss the others?

#could be "Content-Type" or "content-type"
meta = soup.head.find("meta",  {"http-equiv":"content-type"})
if meta is None:
  print("here1")
else:
  print("here2")

Thank you.

You might could try something like this

from bs4 import BeautifulSoup

html_doc = """ 
<!DOCTYPE html>
<html lang="en">
<head>
    <meta http-equiv="content-type"  charset="UTF-8">
    <meta http-equiv="Content-Type"  charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    
</body>
</html>
 """

soup = BeautifulSoup(html_doc, 'lxml')
tags = ['content-type', 'Content-Type']
for tag in tags:
    meta = soup.head.find('meta', {'http-equiv': tag})
    print(meta)

output

Output:<meta charset="utf-8" http-equiv="content-type"/>
<meta charset="utf-8" http-equiv="Content-Type"/>

A way without looping

from bs4 import BeautifulSoup

html_doc = """ 
<!DOCTYPE html>
<html lang="en">
<head>
    <meta http-equiv="content-type"  charset="UTF-8">
    <meta http-equiv="Content-Type"  charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    
</body>
</html>
 """

soup = BeautifulSoup(html_doc, 'lxml')

meta = soup.select('meta[http-equiv="content-type" i]')

print(meta)

output

Output:
[<meta charset="utf-8" http-equiv="content-type"/>, <meta charset="utf-8" http-equiv="Content-Type"/>]

Winfried

menator01