Preserving anchor tags in BeautifulSoup

graham23s · (This post was last modified: May-18-2019, 09:43 PM by graham23s.)

Hello :)

If i do this test:

print("RAW: This is an <a href=\"https://www.thesite.com/\">test</a> string.")

The string: This is a <a href="https://www.thesite.com/">test</a> string. is printed to the console exactly as is typed above.

But when i get the exact same string from the XML file the console shows: This is a test string. completely stripping the <a href=""></a> parts.

soup = BeautifulSoup(projects.text, 'xml')

xml_content_body = soup.find('taskBody')

print("XML: " + xml_content_body.text) <- prints [b]This is a test string.[/b] to the console
print("RAW: This is an <a href=\"https://www.thesite.com/\">test</a> string.") <- prints [b]This is a <a href="https://www.thesite.com/">test</a> string.[/b] to the console

The RAW one is the one i need, but when i get the exact same line of text from the XML the html ahref tag is stripped, i read get_text() does strip all html tags and text also seems to strip it, i will keep debugging the issue :)

regards

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	BeautifulSoup - I can't translate html tags that contain <a href=..</a> OR <em></em>	Melcu54	10	1,934	Oct-27-2022, 08:58 AM Last Post: wavic
	Loop through tags inside tags in Selenium/Python	xpack24	1	5,834	Oct-23-2019, 10:15 AM Last Post: Larz60+
	remove tags from BeautifulSoup result	moski	1	4,773	Jun-05-2019, 01:47 PM Last Post: heiner55

Preserving anchor tags in BeautifulSoup

User Panel Messages

Announcements