Python Forum
Preserving anchor tags in BeautifulSoup
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Preserving anchor tags in BeautifulSoup
#4
Hello :)

If i do this test:

print("RAW: This is an <a href=\"https://www.thesite.com/\">test</a> string.")
The string: This is a <a href="https://www.thesite.com/">test</a> string. is printed to the console exactly as is typed above.

But when i get the exact same string from the XML file the console shows: This is a test string. completely stripping the <a href=""></a> parts.

soup = BeautifulSoup(projects.text, 'xml')

xml_content_body = soup.find('taskBody')

print("XML: " + xml_content_body.text) <- prints [b]This is a test string.[/b] to the console
print("RAW: This is an <a href=\"https://www.thesite.com/\">test</a> string.") <- prints [b]This is a <a href="https://www.thesite.com/">test</a> string.[/b] to the console
The RAW one is the one i need, but when i get the exact same line of text from the XML the html ahref tag is stripped, i read get_text() does strip all html tags and text also seems to strip it, i will keep debugging the issue :)

regards
Reply


Messages In This Thread
RE: Preserving anchor tags in BeautifulSoup - by graham23s - May-18-2019, 09:41 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  BeautifulSoup - I can't translate html tags that contain <a href=..</a> OR <em></em> Melcu54 10 1,907 Oct-27-2022, 08:58 AM
Last Post: wavic
  Loop through tags inside tags in Selenium/Python xpack24 1 5,826 Oct-23-2019, 10:15 AM
Last Post: Larz60+
  remove tags from BeautifulSoup result moski 1 4,771 Jun-05-2019, 01:47 PM
Last Post: heiner55

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020