Python Forum
[SOLVED] [Beautiful Soup] Move line to top in HTML head?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[SOLVED] [Beautiful Soup] Move line to top in HTML head?
#1
Question 
Hello,

I need to loop through HTML files, and make use the charset line is the very first item in the header so the title is correctly displayed.

Do you know of a simple way to do this?

Thank you.

#<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
look_for = re.compile("^Content-Type$", re.I)

soup = BeautifulSoup(open(INPUTFILE,'rb'), "lxml")
meta = soup.find("meta", {"http-equiv":look_for})
#if no meta, add one since BS doesn't
if not meta:
	print("No meta")
	metatag = soup.new_tag('meta')
	metatag.attrs['http-equiv'] = 'Content-Type'
	metatag.attrs['content'] = 'text/html; charset=utf-8'
	head.insert(0,metatag) #insert as first line in head
else:
	#check for dups, remove if any
	print("Found meta(s)")
	metas = soup.find_all("meta", {"http-equiv":look_for})
	if len(metas) > 1:
		print("Dups")
		for meta in metas[1:]:
			meta.decompose()
	#at this point, only one line left: Move it to top in head
	#TODO How to move utf-8 line at top in head?
	"""
	 <head>
	  <title>
	   Blah
	  </title>
	  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
	"""
---
Edit: A less elegant solution that does the job: Removing all the relevant meta lines, and inserting one at the top.

look_for = re.compile("^Content-Type$", re.I)

def insert_meta(soup):
	metatag = soup.new_tag('meta')
	metatag.attrs['http-equiv'] = 'Content-Type'
	metatag.attrs['content'] = 'text/html; charset=utf-8'
	soup.head.insert(0,metatag)

#<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
meta = soup.find("meta", {"http-equiv":look_for})
#if no meta, add one since BS doesn't
if not meta:
	print("No meta")
	insert_meta(soup)
else:
	print("Found meta(s)")
	#remove for dups, if any
	metas = soup.find_all("meta", {"http-equiv":look_for})
	for meta in metas:
		meta.decompose()
	insert_meta(soup)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row AaronCatolico1 0 1,855 Dec-25-2022, 06:28 PM
Last Post: AaronCatolico1
  numpy.array has no attribute head Led_Zeppelin 1 3,168 Jul-13-2022, 12:56 AM
Last Post: Led_Zeppelin
  Cannot get output for df.head() Led_Zeppelin 0 1,811 Jun-28-2022, 02:15 PM
Last Post: Led_Zeppelin
  [Solved] Delete a line in a dataframe using .reset_index() and .drop() ju21878436312 2 4,421 Feb-25-2022, 02:55 PM
Last Post: ju21878436312
  Trouble selecting attribute with beautiful soup bananatoast 3 2,794 Jan-30-2022, 10:01 AM
Last Post: bananatoast
  I need help parsing through data and creating a database using beautiful soup username369 1 2,309 Sep-22-2021, 08:45 PM
Last Post: Larz60+
  [SOLVED] Why does regex fail cleaning line? Winfried 5 3,509 Aug-22-2021, 06:59 PM
Last Post: Winfried
  reading html and edit chekcbox to html jacklee26 5 4,358 Jul-01-2021, 10:31 AM
Last Post: snippsat
  [Solved] Reading every nth line into a column from txt file Laplace12 7 7,495 Jun-29-2021, 09:17 AM
Last Post: Laplace12
  [solved] unexpected character after line continuation character paul18fr 4 6,775 Jun-22-2021, 03:22 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020